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1 Introduction 


Although the applications of statistics in physics are numerous, many 
physicists’ knowledge of the subject is uneven, albeit sometimes including a 
detailed knowledge of some practical technique. It was felt, therefore, that a 
short work which attempted to provide a brief, but fairly systematic, guide 
to the more commonly used statistical ideas and techniques, together with 
enough theoretical background to relate one idea to another, so that the 
whole does not just become a collection of “magic formulas”, would be use¬ 
ful. This is the aim of these notes, and they are intended to be useful for the 
working physicist, both experimental and theoretical, as well as students at 
both graduate and undergraduate levels. However, it is not intended to 
imitate a class textbook, either in scope or rigour. 

With respect to the scope; the book attempts to cover those areas which 
are more frequently used by physicists, rather than those invariably found in 
statistical textbooks. With respect to rigour; the author is a physicist, and 
although the proofs of theorems which are given will satisfy most other 
physicists they may not always be totally acceptable to the professional 
statistician. 

Firstly, then, what do we mean by statistics? We may define “statistics” as 
that branch of scientific method which deals with data obtained by counting, 
or measuring, the properties of populations , where by “populations” we mean 
the collection of observations of a common attribute, e.g. the masses of a set 
of particles. Such populations are essentially numerical, and we will not 
normally consider the collection of particles themselves as a population. This 
is in accord with our everyday view of statistics, but it is interesting to note 
that this view has not always been held. For example, in 1770 statistics was 
defined in one book as, “The science that teaches us what is the political 
arrangement of all the modern states of the known world”, and as late as 
1834 we find, in the founding prospectus of the Royal Statistical Society, 
the following definition, “Statistics may be said to be the ascertaining and 
bringing together of those facts which are calculated to illustrate the con¬ 
ditions and prospects of society”. However, despite such origins, statistics 
has rapidly become an extensive and highly developed branch of numerical 
science, concerned not just with data manipulation, but with such questions 
as the design of experiments, the principles of decision making, and many 
others, all of which have relevance to physics. 
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The plan of this book is as follows. To begin with, there is a very short 
introductory chapter on probability theory. A discussion of probability 
theory is an essential prerequisite before the subject of statistics can be 
meaningfully introduced, but no attempt has been made to give a detailed 
treatment of the topic, and the discussion instead relies largely on intuition. 
This brief chapter is followed by a discussion of theoretical distributions, 
Chapter 3 being concerned with the basic ideas and definitions, including 
definitions of those parameters which are used to characterize any finite 
population, and Chapter 4 describes the properties of several distributions 
frequently met in practice, together with others useful for illustrative purposes. 
Sampling theory is treated in the two chapters that follow. Chapter 5 is 
concerned with theoretical results, but at the end of this chapter the link is 
made between theoretical statistics and the experimental situation. Chapter 
6 concentrates on the properties of three important sampling distributions 
associated with the normal distribution. These are the y 2 , F and Student t 
distributions. There follow three chapters on the important practical topic of 
point estimation, i.e. the estimation of the values of parameters. The first of 
these. Chapter 7, opens with a discussion of the general properties required 
of point estimators, and then describes the method of estimation known as 
Maximum Likelihood. Although the maximum likelihood method is, in a 
sense, the most general method of point estimation, nevertheless other 
methods are also widely used. One which is very popular is that of Least- 
Squares, and this is described in some detail in Chapter 8. The third of 
these three chapters. Chapter 9, gives a brief discussion of some other 
methods of point estimation. Point estimation has always, in practice, to be 
supplemented by a statement about the error associated with the estimate. 
This is the problem of interval estimation and is considered in Chapter 10, 
with particular reference to obtaining confidence intervals for the para¬ 
meters of the important Normal distribution. Besides estimation the other 
main branch of statistics is that of hypothesis testing, and this is treated in 
one longer chapter, Chapter 11. 

The eleven chapters are followed by a Bibliography and four Appendices, 
the latter being included in an attempt to make the book reasonably self- 
contained. The Bibliography is short and contains only those works which 
I have found particularly clear and useful. In Appendix A, Miscellaneous 
Mathematics, there are collected together various mathematical results and 
notations used in the text. Although all these topics have been met in the 
average physicist’s undergraduate days nevertheless the reader may like to 
scan this Appendix first, to reacquaint himself with the necessary mathe¬ 
matics. Appendices B and C are concerned with some practical questions 
which arise in the course of estimation problems. In Chapter 8, on the Least- 
Squares method, the importance of using orthogonal polynomials is stressed. 
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and an account of such functions is given in Appendix B. If the Least-Squares 
method is used in situations which are not linear, or if the Maximum 
Likelihood method of estimation is used, then the problem of minimizing, 
or maximizing, a function of several variables arises. Therefore, Appendix C 
contains a discussion of the problem of numerical optimization of a function 
of ft variables. Finally, Appendix D contains a short set of statistical tables. 



2 Probability Theory 

2.1 Introduction 

Statistics is intimately connected with that branch of pure mathematics 
known as probability theory, and before we can meaningfully discuss 
statistics we must say a little about probabilities. Probability theory as 
discussed in textbooks of mathematics uses such ideas as set theory and 
measure theory. We will use a minimum of such concepts but rely instead 
on more mundane methods (including the physicist’s reputedly better 
intuition). 

The term probability” as used by mathematicians and physical scientists 
has two distinct meanings. For the former the parameters and the nature of 
the population are known, and can be specified exactly (e.g. by one of the 
mathematical forms to be discussed in Chapter 4). In this situation probability 
theory can be developed axiomatically. However, in physical situations the 
parameters of the population are rarely known (in fact one of the objects of 
statistical analysis is to obtain values for them), and the problem arises of 
determining which mathematical expression is correct when only a portion 
of the population is known. Without a knowledge of the entire population 
we cannot, of course, make absolutely precise statements concerning how 
the population is distributed, but we can make less precise statements in 
terms of a probability operationally defined as the limit of the relative 
frequency of occurrence . 

Before proceeding to the definition of probability, however, we will need a 
few simple, but basic, definitions of some subsidiary quantities, which express 
more formally our intuition about such terms as “experiment”, “event”, etc. 

Suppose we can set up a set of initial conditions that are reproducible. 
These conditions define an experiment , and by making an observation (or a 
set of observations) we produce an outcome of the experiment. The outcomes 
we will denote by x t and they will be either single numbers, or possibly sets 
of numbers. 

Definition 2.1. The set of all possible outcomes x t (/ = 1, 2, ...,«) of an 
experiment is called the sample space , or population, and x t is a sample point 
in the space. 
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Definition 2.2. A subset of the sample points, e.g. x u x 2 ,(m < n) 
is called an event and will be denoted by 


E = {x t | i = 1,2,..., m}. 

If m = n, i.e. all the sample space is included in the event, then it will be 
denoted by 

S = {x t \ i = 1,2,...,«}. 

The occurrence of an event may now be formally regarded as the situation 
where the sample point to which observations give rise is included in the 
subset of sample points defining the event. 

For example, an experiment could be the tossing of a six-sided die, for 
which the outcomes would be one of the six numbers one to six. A simple 
event would also be one of these numbers, and the occurrence of the event 
would be the situation where the number defining the event was observed on 
the face of the die. A more general example of an experiment is the measure¬ 
ment of the heights of all males living within a given radius. Here the out¬ 
comes are not discrete, and an event would, in practice, be defined by a 
subset of sample points spanning a small interval of heights. Then the 
occurrence of the event would be interpreted as the situation where a 
measured height fell within the specified range. 

We can now proceed to the operational definition of probability. 

Definition 2.3. Consider a sequence of n trials of an experiment in which 
the event £ of a given class occurs n E times. Then the ratio n E /n is called the 
relative frequency of the event E, and is denoted R[E]. The probability P[£] 
of the event E is the limit approached by P[£] as n increases indefinitely, it 
being assumed that this limit exists. 

Note that the above definition of probability differs somewhat from the 
mathematically similar one, i.e. that for some arbitrarily small quantity e 
there exists a large n , say n L , such that |P[£] — P[£]I < £ for all n > n L . 
The operational definition has an element of uncertainty in it derived from 
the fact that in practice only a finite number of trials can ever be made. This 
way of approaching probability is thus essentially experimental. One assigns 
an a posteriori probability to an event on the basis of experimental observa¬ 
tion. The mathematical approach assigns an a priori probability to an event 
on the basis of a given mathematical model about the possible outcomes of 
the experiment. A typical situation that occurs in practice is when an experi¬ 
mentalist constructs a model of nature and computes from that model 
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certain a priori probabilities concerning the outcomes of an experiment. The 
experiment is then performed, and on the basis of the results obtained certain 
a posteriori probabilities are calculated for the same events. The value of 
the model is then judged by the agreement between these two sets of proba¬ 
bilities, and on this basis modifications may be made to the model. These 
ideas will be put on a more quantitative basis in later chapters when we 
discuss estimation and the testing of hypotheses. 

There are many other ways of defining probabilities, and statisticians do 
not agree amongst themselves on the best way. However, in this book we 
will not dwell too much on the differences between the various definitions. 

2.2 Calculus of probabilities 

As remarked above, there is no general agreement on the best way to 
approach the mathematical theory of probability, and so we shall pass 
straight to the rules of probability without discussing the controversial 
fundamentals. We shall start by listing a number of basic definitions which 
relate to our previous definitions. 

Definition 2.4. The probability of an event E is a number in the range 
0 to 1, i.e. 


0 ^ P[£] < 1, and P[P] = 1 if E = S. 

Definition 2.5. Let E be an arbitrary event of an experiment. Then the 
event “not P” (the complement of E ), will be denoted E. 

Definition 2.6. The event “A or P”, (the intersection of A and P), i.e. the 
event in which both A and B occur, is denoted inP,orPnA Thus if 

A = {x t \i= 1,2,..., 10}, 
and 

B = {x t \i= 5,6,...,20}, 

then 

A n P = {Xi [ 5, 6,..., 10}. 

If A n P = 0 then the events are said to be distinct . 

Definition 2.7. The event “A or P” (the union of A and P), i.e. the event 
in which either A or B or both occurs is denoted A u P or P u A. Thus using 
the example of Definition 2.6. we have 


Au B = {x t | i = 1,2,..., 20}. 
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These definitions can be given a simple geometrical interpretation by the 
diagram of Fig. 2.1. 



Fig 2.1. The sample space S consists of all points within the boundary curve C. A and B 
are two events in S. The doubly-shaded area is A n B, and the sum of all the shaded areas 

is^uS. Finally, the unshaded area is A u B. 

Definition 2.8. If an experiment can result in n mutually exclusive (i.e. the 
occurrence of one event precludes the occurrence of the others), and equally 
likely outcomes, n B 0 0) of which correspond to the occurrence of event 
B, and n AB of which correspond to the occurrence of the event A, given that 
Bhas occurred, then the probability of event A given that B has occurred is 


P[A|B] 


n AB 

n B 


( 2 . 1 ) 


and is called the conditional probability of A. 

If we use the operational form of probability, Definition 2.3, then Eqn (2.1) 
may be written 


PIA | B] 


P[>1 n B] 
P[B] 


( 2 . 2 ) 


Note, however, that the price of not looking too closely at fundamentals has 
resulted in a definition which is somewhat circular, because of its use of the 
phrase “equally likely”. A simple example will illustrate the use of Eqn (2.2). 

Example 2.1. Three “fair” coins are tossed, and we are told that at least 
two of them have fallen “tails”. What is the probability that the third coin 
has fallen “heads”? 
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Let B be the event with at least two tails, and let A be the event with at 
least one head. 

Then 

P[4nB] = 3/8; P[B] = 

Thus, from Eqn (2.2), 

P[A | B] = 3/4, 

i.e. the probability is 3/4 that the final coin has fallen “tails”. 

If the occurrence of an event can be classified according to multiple 
criteria then the term marginal probability is used whenever one or more of 
these criteria are ignored in the classification. If we consider the case of three 
classifications A, B and C, then we are led to the following definition. 

Definition 2.9. If the classifications under the criteria are A t , A 2 , ..., A r ; 
B u B 2 , ..., B s ; and C u C 2 , ..., C t ; with 

EPD4J = ZP[BJ = £P[C f ] = 1, 

then the marginal probability of A t and C k is 

P[A; n C k ] = X P[A t n Bj n CJ, (2.3) 

j=i 

and, likewise, the marginal probability of C k is 

p l c kl -St PLAnBjnCJ 

i=i j= i 

= i PIA, O Q] 

i= 1 

= f P[B, n CJ. 
j=i 

The final definition concerns the concept of independence. 

Definition 2.10. The event A is independent of the event B if 

PIA | P] = P[>4]. (2.4) 

The above definitions may be used very simply to derive the following 
useful results: 

P[I] - 1 - P£A], (2.5) 
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PIA n B] = P[^] • P[B | yl] = P[B] • P[yl | B] 

= P[A] • P[B] (if y4 and B are independent), (2.6) 
PIA u£]= P[A] + P[B] - PIA n B] 

= P[A] + P[B] (if A and B are mutually exclusive). (2.7) 

Finally, we shall state the basic theorem of permutations: 

Theorem 1.1. The number of ways of “permuting” (i.e. arranging) m objects 
selected from n distinct objects is 


nPm = 


n\ 

(ft — m )! ' 


( 2 . 8 ) 


It follows from this theorem that the total number of “combinations” of the 
m objects without regard to arrangement is 


nCm = 



nPm _ »! 

ml m\(n — m)\ 


(2.9) 


2.3 Statistical inference 

The calculus of probabilities as outlined above proceeds from the definition 
of probabilities of simple events to the probabilities of more complex events. 
In practice, however, what is required is just the inverse, i.e. given certain 
experimental observations we require to know something about the parent 
population and the generating mechanism by which they were produced. 
This, in general, is the problem of statistical inference. To illustrate a basic 
difficulty we shall discuss briefly a theorem propounded by the Rev. Thomas 
Bayes, an English clergyman, in 1763. 


Theorem 2.2. (Bayes’ Theorem). If Bfi = 1, ...,n) are mutually exclusive 
and exhaustive (i.e. all possible events are included in the B ; ) events, and if A 
can occur only in combination with one of the n events B ; , then 


mm- 

Z PlBj]PLA\Bjl 

i= 1 


( 2 . 10 ) 


The proof of this theorem is very simple but affords an illustration of the use 
of Definitions (2.4) to (2.10). 
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Proof. From Eqn (2.6) 

P[inB j ]=PM?[5 j |4 

and 

PlB i nA]=P[B i -]PlA\B i -], 
but these two quantities are equal and so 


PtB t | A] 


P[j? ; ] PjA [ BJ 
P[A] 


Now from Definition (2.9) and Eqn (2.6) 


n 


P[A1 = E mnBJ 

j=i 


= Z 

1=1 

Thus from (2.11) and (2.12) 


fWM]-— F ™ P[ ' 4|B ' ] . 

E nsjpwisj 

J=1 


( 2 . 11 ) 


( 2 . 12 ) 


(2.13) 


which completes the proof. 

Bayes 9 Theorem has some important consequences and we shall examine 
it more closely. Suppose an event can be explained by the mutually exclusive 
hypotheses represented by B u B 2 , ..., B n . These hypotheses have certain 
a priori probabilities P[BJ of being true. Each of them can give rise to the 
occurrence of A, but with distinct probabilities P[A | BJ. Bayes 9 Theorem 
tells us how to compute the a posteriori probabilities P[B f | A"], which are 
the probabilities of having B t when A is known to have occurred. The quantity 
P[.4 | B f ] is called the likelihood. If we had to choose an hypothesis from the 
set B t we would choose that one with the greatest a posteriori probability. 
However, Eqn (2.10) shows that this requires a knowledge of the a priori 
probabilities PCBJ, and these are, in general, unknown. Bayes" Postulate is 
the hypothesis that, nothing known to the contrary, the a priori probabilities 
should all be taken as equal. We will give a simple example of the use of 
Bayes 9 Postulate . 

Example 2.2. Suppose we have a jar (or urn in statistical parlance) which 
contains four balls which may be either all white (hypothesis 1), or two 
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white and two black (hypothesis 2). If n balls are drawn, one at a time, 
replacing them after each drawing, the probabilities of obtaining an event 
E with n white balls under the two hypotheses are 

P[E|ff x ] = 1; P[£|HJ = 2-. 

Now, from Bayes’ Postulate 

P[HJ = P[HJ = h 


and so from Eqn (2.10) 

P[tf! I E] = yqrp = 

Thus, provided no black balls appear, we would always accept the first 
hypothesis because it has the larger a posteriori probability. 

Bayes’ Postulate has been the subject of much controversy. In the frequency 
theory of probability it would imply that events corresponding to the various 
B t are distributed with equal frequency in some population from which the 
actual B; has arisen. Many statisticians reject this as being unreasonable. 

Later in this book we shall discuss some of the many suggested alternatives 
to Bayes’ Postulate, including the principles of least-squares and minimum 
chi-square. We shall anticipate that discussion by mentioning briefly here 
(and in more detail in Chapter 7), one principle of general application, that 
of maximum likelihood. 

From (2.10) we see that 

P[B f | A] oc P[B f ]L, (2.14) 

where L stands for the likelihood. The Principle of Maximum Likelihood 
states, that when confronted with the choice of a set of hypotheses B h we 
choose that one which maximizes L, if one exists, i.e. that one which gives 
the greatest probability to the observed event. Note that this is not the same 
as choosing the hypothesis with the greatest probability. It is not at all 
self-evident why one should adopt this particular choice as a principle of 
statistical interference, and we will return to this point in Chapter 7. For the 
simple case in Example 2.2, the Maximum Likelihood method clearly gives 
the same result as Bayes’ Postulate. 




3 Theoretical Distributions: Basic Ideas 

3.1 Population parameters 

In most physical work the observer does not have a complete population, 
but has instead a sample , which is a subset of the total population. It is the 
central problem of physical statistics to estimate the properties of the pop¬ 
ulation from the nature of the sample. This process is known as statistical 
inference and was briefly introduced in Chapter 2. Firstly, however, we will 
review some very simple, but important, parameters which characterize any 
finite population, but which are not to be used for the purpose of statistical 
inference. Ideas introduced in this section will be used again when we discuss 
sampling in Chapters 5 and 6. 


3.1.1. Measures of Location 

Definition 3.1 . The arithmetic mean p a of a set of N values x t (i — 1, ..., N) 
is defined by 


1 N 

(3 ‘ 1} 

Although the arithmetic mean is the most commonly used measure of 
location, and what is usually meant when one talks of the “mean”, there 
exist other measures of location which in some circumstances are more useful. 
We will not dwell on these points here because in most physical work the 
arithmetic mean is (in a sense which will be considered later) the “best” 
measure of location. In fact, in later sections, we will revert to the usual 
convention of calling the arithmetic mean simply “the mean” and denote it 
by n. However, below are given two other commonly used measures of 
location which are also used from time to time. 

Definition 3.2. If the quantities x t , x 2 , ..., x N are arranged in increasing 
(or decreasing) value and then renumbered as x (I) , x (2) , ..., x (N) , the median 
H m is defined as the middle value of the new set, for N odd, and as the mid¬ 
point of the middle pair of values if N is even. 


12 
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Definition 3.3. The mode M is that value of x in the set x X9 x 2 , ..., x N 
which occurs with maximum frequency. 


3.1.2. Measures of Dispersion 

Just as in the case of measures of location, where there exist several 
different possible parameters which can be used to characterize a population, 
there is more than one possible measure of the dispersion , or scatter, of the 
measurements within the population. We shall simply mention the two most 
useful. 

Definition 3.4. The mean deviation <5 m is defined as the arithmetic mean of 
the absolute values of the deviations of the observations from the median 

<5 m = 4 £ l*i ~ A*ml* ( 3 - 2 ) 

SS i=i 

Definition 3.5. The variance a 2 of a population is defined as the arith¬ 
metic mean of the squares of the deviations of x t from the arithmetic 
mean p a 


ff 2 = -^ I Oi - Ha ) 2 . (3.3) 

The square root of the variance a is called the standard deviation . The ratio 
o\p a is sometimes called the coefficient of variation . 

3.1.3. Moments and Skewness 

Definition 3.6. The nth moment of a population about an arbitrary point 
x is defined as 


fin' = ^ I (*l - *>"• ( 3 ‘ 4 ) 

IV i = i 

If the point 3c is taken to be the mean p (from now on taken to be the arith¬ 
metic mean), then the moments are called the central moments and are 
conventionally written without a prime. 

From (3.4) we have 

Pq = 1 ; pf = p - 3c = d; pf = c 2 + d 2 , 

^0 = 1 ; p t = 0 ; p 2 = o 2 . 


(3.5) 

(3.6) 
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The general relation between the two sets of moments is 

(3.7) 

with its inverse 

Hk = ( * ) /**-r 0*i7- (3.8) 

The importance of moments stems from the fact that a knowledge of the 
first few essentially determines the general characteristics of the distribution. 

The skewness, or deviation from a symmetrical form, is defined in more 
than one way, but a common practice is to use the ratio 

Pi = vSlUi 3 , (3.9) 

since fi 3 = 0 for a population distributed symmetrically about the mean. A 
measure of kurtosis , or degree of peaking, is, likewise, often taken to be the 
ratio 


f* 2 = 2 2 . (3.10) 

We shall see later that /? 2 — 3 for the so-called “Normal Distribution” and 
this value is often taken as a standard. 


3.2 Continuous univariate distributions 

We have, on several occasions above, referred to “distributions” without 
defining the term. Such anticipation is difficult to avoid in a brief work of 
this kind, and the intuitive understanding of simple concepts before they 
are formally defined will be used more than once again. However, to proceed 
further we must consider more carefully what is meant by a distribution. 
This will be done in the present chapter, and will be followed, in Chapter 4, 
by an account of some of the simpler properties of the more important 
distributions encountered in practice. 

In books on advanced statistics it is usual to avoid stating all definitions 
twice, once for continuous distributions and once for discrete, by using 
Stieltjes integration. However, since we shall be concerned with only a few 
distributions we shall merely repeat the steps which prove necessary. 

Firstly, we shall introduce the concept of a random variable. 

Definition 3.7. A random variable is a function which can take on a 
definite value at every point in the sample space. Thus, if we have a sample 
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space S with an associated probability function P , and a random variable X 
defined over the sample space, then to each point x t of S we can assign a 
probability PM> and a definite numerical value X{x t ) for the random 
variable. 

The number of “heads” obtained by tossing two coins is an example of 
a random variable which can assume the discrete values 0, 1 or 2. Thus, if 
we distinguish between the two coins, the sample space consists of the 
points 

x t = x 2 = (H, T); x 3 = (T,H); x 4 = (T, T) 

and 

X(x t ) = 2; X(x 2 ) = 1; X(x 3 ) = 1; X(x 4 ) = 0. 

If the coins are both “true”, then we can calculate PM as follows 

P[X — 2] = PM = h 
P[X = 1] = P[x 2 u x 3 ] = i, 

P[X = 0] = PM = h 

A random variable can also be continuous if it assumes a continuum of 
values. 


Definition 3.8. The continuous random variable x is said to have a 
probability density function {or simply a density function) f {x) if it satisfies the 
following conditions: 

(i) f{x) is a single-valued non-negative real number for all real values of x, 


(ii) / (x) is normalized to unity 


j: 


f{x)dx - 1, 


(3.11) 


(iii) the probability with which x falls between any two real values a and b~ 
for which a < b is given by 


P[a < x < h] 


dxf{x). 


(3.12) 


Definition 3.9. The cumulative distribution function {or simply the distri¬ 
bution function) F(x) of the continuous random variable x is defined by 

F(x) = f dtf(t). 

J — 00 


(3.13) 
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Since the probability that a member chosen at random (i.e. by a method 
which makes it equally likely that each member of the population will be 
chosen) from a distribution has a value x is just the density function f(x), 
for this is the proportion of the population with this value, it follows from 
this definition that the probability of the member having a value is the 
distribution function F(x). It also follows from Eqn (3.13) that F(x) is a 
non-decreasing function of x and that 0 < F(x) < 1. This definition is 
clearly consistent with Definition 2.4 of Chapter 2. However, we should 
again note the element of circularity in the concept of randomness defined 
in terms of probability. 

In terms of these formal definitions we may rewrite some of the earlier 
definitions of Chapter 2. Thus the mean about a point x is 



j dxf(x)(x - x), 

— 00 

(3.14) 

and the variance 

O- 2 = f dxf{x){x - ;U) 2 , 

J — oo 

both of which are special cases of the general moments 

(3.15) 

Fn = 

m 

f dxf(x)(x - x) n , 

J — 00 

(3.16) 

and for x = yi/ 

A "j 

dxf(x)(x - /V)". 

— oo 

(3.17) 


The integrals in (3.16) and (3.17) may not converge for all n, and some 
distributions possess only the trivial zero-order moment (yi 0 = p 0 ' = 1). In 
what follows we shall usually set x = 0. 


3.3 Expected values 

The expected value of a random variable (or any function of a random 
variable) is obtained by finding the average value of the variable over all its 
possible values with due regard to the probability of their occurence. Our 
first definition follows from the definition of a density function. 


Definition 3.10. Let x be a continuous random variable with density 
function/(x). Then the expected value of x, £[x] is 



x / (x) dx. 


(3.18) 
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It can be proved from this definition that for a function h(x) of x the 
expected value is 

£[/*(*)] = P dxh{x)f{x). (3.19) 

J — 00 

The following easily proved results hold for expected values involving a 
random variable x and a function h(x), where c is a constant: 



£[c] = c 



E[_ch(xy] = cE[h(x)~\ 

(3.20) 


Eih^x) + h 2 (xy] = £[*!(*)] + E[h 2 {xy] 

(3.21) 

and 

£[/!i(xi)/! 2 (x 2 )] = £[/!i(xi)] E[h 2 (x 2 y] 

(3.22) 


if x u and x 2 are independent variates. 

From Eqn (3.18) we see that the nth moment of a distribution about any 
point x is simply the expected value of (x — x) n . Thus, for example, 

W, = £[(* ~ Ml')"]- 

3*4 Generating functions and related topics 

It was mentioned earlier that the usefulness of moments stems partly from 
the fact that a knowledge of them determines the form of the distribution 
function. This fact is embodied in the following theorem, which we shall 
state without proof. 

Theorem 3.1. If the moments n„ of a random variable x exist, and the series 



converges absolutely for some r > 0, then the set of moments fx n uniquely 
determines the distribution function . 

In practice, a knowledge of the first few moments essentially determines 
the general characteristics of the distribution, and it is therefore worthwhile 
to construct a method which gives a representation of all the moments. Such 
a function is called the moment generating function (m.g.f.). 

Definition 3.11. If the random variable x has a density function f(x) then 
the moment generating function (m.g.f.) is defined as 

M x (t ) = E[e*] = f°° dxe tx f(x). 

J — co 


(3.23) 
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To generate the moments from (3.23) we expand exp (Oc) giving 

M x (t) = £[ 1 + xt + :t) 2 + ...] 


Dilferentiating n times and setting t = 0 then gives 

, _ 8"M x (t) 
h ~ dt n t=0 ' 

In general, the m.g.f. about any point x is 

M-(t) = £[exp {(x — x)f}]. 
Thus, for x — n we have, using Eqn (3.20) 

M,(f) = e-^I). 


(3.24) 


(3.25) 


(3.26) 


Another important use of m.g.f.’s is in comparing two density functions, 
when the results of the following theorem may be used. 


Theorem 3.2. Let x and y be two continuous random variables with density 
functions f(x) and g(y ), respectively . If these distributions possess m.g.f's 
equal for some interval symmetric about the origin then f (x) = g(y ). 

It is sometimes more convenient to consider, instead of the m.g.f., its 
logarithm. If we write a Taylor expansion for this quantity we have 

t 2 

\nM x (t) = K t t + k 2 — + 

where k„ is the so-called cumulant of order n, and 

d n In M x (t) 

Kn ~ df 

The cumulants are rather simply related to the central moments of the 
distribution, the first few relations being 


k 1 = p 1 


k 2 = p 2 


*3 = ^3 

= i“4 - 3j“ 2 2 . 
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For some distributions the integral in Eqn (3.13) defining the m.g.f. will 
not exist, and in these circumstances another function called the characteristic 
function (c.f.) is introduced. 

Definition 3.12. If the random variable x has a density function/(x) then 
the characteristic function (c.f.) is defined as 

<t> x (t) es E[e**] = P “ dxc itx f(x) 

J — 00 

= M x (it). (3.27) 

The characteristic function is very important in theoretical statistics, and 
a knowledge of it uniquely determines the density function. This result is 
known as the Inversion Theorem. 

Theorem 3.3 (The Inversion Theorem). If fix) is a density function with a 
distribution function continuous everywhere and has a characteristic function 
(j) x {t) defined by Eqn (3.27) then 

1 Coo 

/(*)=—J-. dt4> * {t)Q -' xt - (3 ' 28) 

Coo 

Since /(x) dx is absolutely convergent, then, if F{x) is continuous 
J — 00 

everywhere (as is required in Theorem 3.3), $ x (t) is the Fourier transform of 
the density function/(x), and the Inversion Theorem is simply the Fourier 
transform theorem. 


3.5 Discrete univariate distributions 

A probability density, and its associated distribution function, may be 
defined for a set of discrete values x l5 x 2 ,.. x n by analogy with the definitions 
for continuous univariate distributions given in Section 3.2. Likewise, the 
results of Section 3.3 and 3.4 may be extended in a straightforward way to 
the case of discrete variables, and so we shall not discuss them further. 


3.6 Multivariate distributions 

The work of Sections 3.2-3.S may also be extended to the case of multi¬ 
variate distributions. We shall give below a few definitions for continuous 
variables. The results for discrete variables are obtainable by obvious 
substitutions. 
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Definition 3.13. The n continuous random variables x t> x 2 , ...,x„ are said 
to have a multivariate joint density function f{x u x 2 , ..., x n ) if, 

(0 fix i, x 2 , ..., x„) is a single-valued non-negative real number for all real 
values of x u x 2 , .... x„, 

(“> j — f f(x u x 2 , ...,x„) n dx t = l, 

J - CO J -oo i — 1 

(iii) Pl«i <x 1 ^b 1 ;...;a n K : x n ^b n ] 

rb n /* b ! « 

I **’| f&l* •••> %n) I~I dXf, 

Ja n Ja i i— 1 

where ^ ;...; a n ^ x H < b J denotes the probability that x 1 falls 

between any two real number a 1 and b t : and x 2 falls and x n falls ... a n 
and b n , simultaneously . 

By analogy with the work of Chapter 2 we shall also define a marginal 
density function, and a conditional density function. 


Definition 3.14. If the n continuous random variables x t (i — 1, 2, ..., n) 
have a joint density function f(x u x 2 ,..., x n ) =/(x), then the marginal 
density function of the variables x t (i = 1,2,..., m < n) is the value of /(x) 
integrated with respect to all variables other than x u x 2 , ..., x m , i.e. 


f (x l9 x 2 , ...,x m ) 


<■■1 


fix !, ... , x m , x m+1 ,..., x„) II dx t . (3.29) 

o * = m +1 


Definition 3.15. The multivariate conditional density function of the ran¬ 
dom variables x t (/ = 1,2,..., m <ri), given the variables x m+1 , x m+2 ,..., 
is defined as 


f j X 2 , ..., + j, X m + 2 > • • •> — 


f (x i? •••) -^n) 

(^m + 1 j X m + 2 j • • • j «*») 


. (3.30) 


Again by analogy with the work of Chapter 2, this time Definition 2.10, 
we shall consider the concept of statistical independence. 


Definition 3.16. If the random variables x t (i = 1, 2, ..., n) may be split 
into groups such that the density function of the variables is expressible as 
a product of marginal density functions of the form 

/(*!, *2, =/l C*l> X 2 , •••> X^)f 2 (x i+ x,X i + 2 , ...,X fc )... 

•••fN M (x l+i9 Xi +2 > 
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then the sets of variables 

(x i, ..., X-) j (Xj + i> •••} ^Jt)i •••» C*'f + 1 j •••» 

are said to be statistically independent , or independently distributed. 

The multivariate joint distribution function F(x u ...,x n ) of the n variables, 
x l9 x 2 , ..., x n is obtained from the generalization of Definition 3.9. Thus 

F( Xl . x n ) = P... P /(/!, t 2 , ..., o n (3- 31 ) 

J — oo J — oo i — 1 

Likewise, the rth moment of the random variable x f is obtained from the 
generalization of Definition 3.10. Thus 

£[x ( r ] = I” •••(* f(x i ,x 1 , dx h (3.32) 

J - oo J — oo i —1 

and from this relation we have 

Hi = f ... f x t f(x u x 2 , ...,x„) jfl dx h (3.33) 

J — co J — oo i — 1 

and 

<7j 2 =(* ... f (x f - ni) 2 f(.x u x 2 , ...,x„) n <&;. (3.34) 

Besides the individual moments defined by Eqn (3.32), the multiplicity of 
variables enables a number of joint moments to be defined. In general, these 
are given by 

E[xJx b J ... x/] 

= f ... f (xjx b j ... x c ' t )/(x J , x 2 ,..., x„) n dx t . (3.35) 

J — 00 J — 00 i— 1 

The most important joint moment is the covariance defined as follows. 

Definition 3.17. If the n random variables x ; (i = 1, 2, .... n ) have a joint 
density function f{x u x 2 , ...,x n ) then the covariance of any two variables 
X; and Xj is defined as 

1*00 {* 00 n 
COV (Xj, Xj) = Oij = ... (x,- — Hi)( Xj — Hj)f(. x U x 2> -»^n)n 


(3.36) 
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where p t and p,j are given by Eqn (3.33). In terms of a ip the correlation 
coefficient p(x i9 Xj) is defined by 


p(x i9 Xj) = 


cov (x i9 Xj) 
o(x t ) <T(Xj) ’ 


(3.37) 


The correlation coefficient is a number lying between +1 and —1. It is a 
necessary condition for statistical independence that p(x i9 Xj) — 0. However, 
this is not a sufficient condition and p(x i9 xj) — 0 does not imply that x t and 
Xj are independently distributed. The following simple example illustrates 
this. 


Example 3.1. Let x x = x and x 2 = y — x 2 . Then 


cov (x, y) = E[xy~] - E[x\ E[y ] 

= £|> 3 ] - iEM £|> 2 ]. 

Now if x has a density function which is symmetric about the mean then all 
the odd-order moments vanish and, in particular, 

E\_x~] = £|> 3 ] = 0. 

Thus cov (x, y) = 0 and hence p(x 9 y) = 0, even though x and y are not 
independent. 


3.7 Functions of a random variable 

In the previous sections of this chapter we have considered definitions 
relating to a continuous random variable x with a given density function/(x). 
In practice, however, we may have occasion to refer to a function of jc, 
e.g. y(x), and the question arises: what is the density function of y(x)l 
I f 7 = y(x) is monotonic (strictly increasing or decreasing) then the 
solution is simply 


/O'M) =/OM) 


dx 

dy 


(3.38) 


However, if y(x) has a continuous non-zero derivative at all but a finite 
number of points we must split the range into a finite number of sections 
in each of which j(x) is a monotonic strictly increasing or strictly decreasing 
function of x with a continuous derivative, and apply Eqn (3.38) to each 
section separately. Thus, at all points where (i) dy/dx =£ 0 and (ii) y = y(x) 
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has a real finite solution for x = x(y), the required density function 

/O'M) = n /(*M) 

all x 

If the above conditions are violated then f(y{x}) = 0 at that point. 
Example 3.2. Given a random variable x with density function 


dy 

dx 


is 


(3.39) 


/(*) = 


V(2 n) 


exp 




what is the density function of y = x 2 1 
Now 


x = ± s/y and — 2x = ± 2*Jy. 
dx 


Thus, for y < 0, x is not real and so 

f(y{x}) = 0, y < 0. 

For y = 0, dxjdy = 0 and so, again 

/O'M) = T = 0. 

Finally, for y > 0 we may split the range into two parts, x > 0, and x < 0. 
Then, applying Eqn (3.39) gives 

/O'M) = ^M/(* = - Vt) +/(*= + vM1. 

Similar arguments to those above hold for the multivariate case. Thus, if 
we have n random variables y t (/ = 1, 2, ri) which are themselves functions 
of the n random variables x t (i = 1,2, ...,«) defined by 

Ti = Ti (*1> *2> •••? "O? * “ 1, 2, ..., W 


such that are continuous for all x p and such that the Jacobian 
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then, in the region where there is a unique solution for x t in terms of the v- 
we have 

/Oi.-Va. ■■■,y n ) = \J\~ 1 f(x 1 ,x 2 , ...,x„). (3.40) 

Again, if the above conditions do not hold then the range of the variables 
can always be split into sections, as for the univariate case, and Eqn (3.39) 
applied in each section. 

Example 3.3. Given the two random variables x t and x 2 with a joint 
density function 

f(x u x 2 ) = exp {-KV + x 2 2 )}, 
what is the density of the variables 

y 1 =xjx 2 and y 2 = x t ? 

The Jacobian of the transformation is 

J-det {'f 

*1 Ti 2 

x 2 2 y 2 ' 

Thus, applying (3.39), {provided J # 0), 


/0Wa) ~~ 2^" CXP ("K^ 2 + **)) 


Oh # 0). 




4 Theoretical Distributions: Examples 


In Chapter 3 we have considered the general properties of theoretical 
distributions. In this chapter we will consider the forms and properties of 
some specific distributions commonly encountered in practice, and two 
others which are useful for illustrative purposes. 


4.1 Uniform distribution 

Definition 4.1. The uniform distribution for a continuous variable x has a 
density function 


f(x ) = u(x; c, d) = 


1 


d — c 
0 


c < x ^ d 
otherwise 


The distribution function obtained from (4.1) is 

x < c 


F(x) = 


0 

x — c 


c < x ^d 


d — c 

1 x > d 


(4.1) 


(4.2) 


From (3.14) and (3.15) we can easily show that the mean and variance are 
given by 


V = 


c + d 
~ ; 


d- 
12 ’ 


(4.3) 


The Dirac 5-function, well-known to physicists, may be considered as the 
limiting form of a uniform distribution where d c. 

Although the uniform distribution is the simplest of all continuous dis¬ 
tributions it is useful in practice for studying rounding errors in measure¬ 
ments made within a given accuracy. However, its theoretical significance 
is enhanced by the following easily proved theorem. 
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Theorem 4.1. Let fix) be any density function of a continuous random 
variable x, and let F(x) be its distribution function. Then fix') may be trans¬ 
formed to the uniform density function 

g{u) = 1, 0 < u < 1 

by the transformation u = F(x). 

By using this theorem it is possible to exhibit many properties of con¬ 
tinuous distributions in general, by proving them for the paricular case of 
the uniform distribution. It also follows from this theorem that there exists 
at least one transformation which transforms any continuous distribution 
into any other, it being simply the product of the transformations which take 
each distribution into the uniform distribution. 


4.2 Univariate normal distribution 

This distribution is by far the most important in statistics since many 
distributions encountered in practice are believed to be of approximately 
normal form, a point to which we will return in Chapter 5. 


Definition 4.2. The normal density function for a continuous random 
variable x is defined to be 


fix) = nix; n, a) = 


7W exp 



and its distribution function is 


F(x ) = N(x; \x, a) = 


o 




(4.4) 


(4.5) 


F(x) is also called a Gaussian distribution function. Graphs of f(x) and 
F (x) for \i = 0 and a = 0*5, 1*0 and 2-0 are shown in Fig. (4.1). 

Since this is the first non-trivial distribution we have encountered it will 
be useful to implement some of our previous definitions. Firstly, it is clear 
from Eqn (4.4) that f(x) is a single-valued non-negative real number for all 
values of x(a > 0). Furthermore, by the transformation 
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we can write 


r°° if 00 , 

L dxm -^Lj" : ■ 

Since the latter integral is (n)* we see that f(x) is normalized to unity and 
is thus a valid density function. 



Fig. 4.1. The normal density n(x\ fi , <r), and its distribution function N{x\ fi, a\ for fi — 0 
and o' = 0'5, 1*0 and 2 0. 

To find the moments of the normal distribution we first find the m.g.f. 
From Definition 3.11, Eqn (3.23), 

M x (t ) = £[exp (/*)] = exp (tfi) £[exp {t(x - /*)}] 

= p d \ 0LzJizJ* n. 

(2 nYo J-® P L -2<t 2 J 
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This integral is related to the area under a normal curve with mean (fi + <j 2 t) 
and variance a 2 . Thus 


M x (t ) = exp 0/i + <r 2 / 2 /2). (4.6) 

On differentiating (4.6) twice and setting / = 0 we have 


J«i' = A* 

li 2 ' = a 2 + n 2 

and 

var (x) = n 2 ' - (/V) 2 = a 2 . 


Thus the mean end variance of the normal distribution are n and a 2 , res¬ 
pectively. The same techniques for moments about the mean give 


f^2n ^ 


(2 »)! 

n!2" 


a 


2 n 


/i 2 „ + 1 = 0, n> 1. 


(4.7) 


The odd-order moments are zero by virtue of the symmetry of the distribu¬ 
tion. Using (4.7) in Eqns (3.9) and (3.10) gives 


Pi = 0; p 2 — 3. (4.8) 

This value of ft 2 is taken as a standard against which the kurtosis of other 
distributions may be compared. 

Using essentially the same technique as was used to derive the m.g.f. 
above we can show that the normal c.f. is 


<t>(0 = exp [ it/x - t 2 a 2 / 2], (4.9) 


which agrees with Eqn (3.27), and which may be confirmed by applying the 
result of the Inversion Theorem. 

For the special case when /i = 0 and a 2 = 1 we have, from (4.4) and (4.5) 


n(t; 0; 1) = 


N(t; 0; 1) = 


(2k)* 

(4.10) 

(2 *)*/-«/“ eXp( -" !/2 >- 

(4.11) 


These forms are called the standard normal density function, and standard 
normal distribution function, respectively, and will simply be denoted by 
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n(t) and N(t). Tables of «(/) and N(t) are given in Appendix D. They may 
be used for all families of normal curves if the following, easily proved, 
relations are noted: 


n(-t) = n(t), (4.12) 

N(-t)= l-N(t), (4.13) 

2 j‘n(u)du = J* dun(u ) = 2N(t) - 1. (4.14) 

Using Eqns (4.12) - (4.14) and the tables in Appendix D the following useful 
results may be deduced. 

(i) The proportion of standard normal values contained within 1, 2 and 3 
standard deviations from the mean are 68-3%, 95-4%, and 991%, 
respectively. 

(ii) If t a denotes that value of the standard normal distribution for which 



(4.15) 


then (p ± t a <r) are the limits of the 100(1 - 2a) % symmetric interval 
about p. 

The usefulness of such results will be evident when we discuss confidence 
intervals in Chapter 10. 

One final result which we shall quote for the univariate normal concerns 
the distribution of a linear sum of normally distributed random variables. 


Theorem 4.2. If x t (i = 1, 2, n) are n independent random variables having 
normal distributions N(x t ; p i9 a t 2 ) then the random variable T = ^a^ is 
distributed as N(T ; p, a 2 ) where 1 

n n 

H = D «,jUi and a 2 = £ a? of. 

i— 1 1=1 


Proof. The c.f. of T is given by 


= £[exp {itT)~] 


= E 
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Using the fact that the x i are independent we may write this as [cf. Eqn 
(3.22)] 

n 

4>r(t) = n £[exp (/tape,)] 


= n m, 

i — 1 

where is the c.f. of the random variable (apq). Now we have previously 
shown that \cf. Eqn (4.9)] 


<f>i(t) = exp litaiHi - t 2 a t 2 ai 2 l2], 

and so 

= expj £ (itdiHi - tW/2)j, 

i.e. 0r(O = exp “ t 2 G 2 /2) 9 

w h ere n n 

V = £ a^i’, <* 2 = S a t W- 

i=l i= 1 

But this is the c.f. of a normal variate whose mean is fi and whose variance 
is a 2 . Thus, by the Inversion Theorem, T is distributed as N(T ; p, a 2 ). 


4.3 Multivariate normal distribution 

The normal distribution for the multivariate case is defined as follows: 

Definition 4.3. If x l9 x 2 , ..., x n = x are « random variables, then the 
multivariate normal density function , of order n, is 

/(x; |», V) = ' ( 2^|V|* 6Xpl -~ “ 10]* (4.16) 

where \i is a constant vector, which is the mean of the distribution, and V 
is a symmetric positive-definite matrix, which is the variance matrix of the 
vector x. The quantity 


Q-(x-|») T V- 1 (x-|0, (4.17) 


is called the quadratic form of the multivariate normal distribution. 
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The multivariate normal distribution possesses a number of important 
properties, and we shall consider three of these here. The first concerns the 
form of the joint marginal distribution of a subset of the n variables. 

Theorem 4.3. If the n random variables x t , x 2 , • ••, x n are distributed as the 
n-variate normal distribution then the joint marginal distribution of any subset 
x t (/ = 1, 2, ..., m < ri) is the m-variate normal 

This theorem can be proved in a straightforward manner by simply con¬ 
structing the joint marginal distribution from Eqn (4.16), but we will not 
reproduce the details here. It follows from Theorem 4.3 that the distribution 
of any single random variable in the set x t (this is the case m = 1) is dis¬ 
tributed as the univariate normal. We shall use this result in the second 
theorem which concerns the conditions under which the variables of the 
distribution are independent. 

Theorem 4.4. If the random variables x u x 2 , x n = x have the mul¬ 
tivariate normal distribution with mean vector p and variance matrix V, then 
the components of x are jointly independent if and only if cov(x f , xj) — 0 
for all i # j. 

Proof If cov(x i5 Xj) = 0 for i ^ j then the variance matrix V is diagonal. 
In this case the quadratic form of the distribution becomes 


(x - ji) T V \x -\i )=£(*<” li^ 2 Vu \ 


i=i 


and so the density function may be written 
1 


/(X) = (2 nfW eXPC ~ KX " ^ V (X ” *° ] 


i.e. 


where 


/(*)=n/<(*i)> 

i— 1 


y /■ \ 11 1 (^j fli) -j 

/ ' ( " ) “(2^T? exp [' i— K~\- <4 s) 





32 


THEORETICAL DISTRIBUTIONS: EXAMPLES 


4.3 


Now (4.18) is the form of the density function for a univariate normal 
distribution and so, by virtue of Theorem 4.3 and Definition 3.16, the vari¬ 
ables x t are independently distributed. 

To establish the inverse, i.e. that if the x t are jointly independent then V 
is diagonal, we start from the definition of V. 


V U = cov fe xj) = El(x t - pMpcj - Af/)], (i # j) 


r oo /* oo 

J — 00 J - c 


(*l - /*.)(*/ - flj)f(x) fl dx k- 


*= 1 


Now since and Xj are independent we have 


/(*) = FI fi( x d> 

i — 1 


and hence 





oo 

fli)fi{. x l) dx i I ( X J ~ Hj).fj(Xj) dXj 
J - oo 


n /• o 

n I 

k±i,j J - oo 


f k ( x k)dx k . 


But, by definition. 


and so, 


f* 00 

(*< - Hi)fi(Xi) dx t = 0, 

J - oo 


Py = 0 for all i ^ / 


This completes the proof. 


The third, and final, result concerns the distribution of linear combina¬ 
tions of random variables, each of which itself has a univariate normal 
distribution. The following theorem is a generalization of Theorem 4.2. 


Theorem 4.5. If x = x t , x 2 , ■■■, x„ has a multivariate normal distribution 
with mean p and variance matrix V, then any linear combination of x t , say 

s = £ a i x h 

i= 1 

with a v a set of constants , has a univariate normal distribution with mean 


n 


n = £ wu 

i = 1 
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and variance n n 

a 2 = £ Z a i a jVij- 

j=l i=l 

Proof. Let 

s = £ a i x i = xTa » 

i=l 

where A = {a { }. The m.g.f. of S is 

M s (0 = £[exp(S0] 

= £[exp (x T A)r] 

= £[exp {(x - p) T At + (p r A)r}] 

= exp [(p. T A)f] £[exp (x - p) r At]. 

NOW |--2 -I 

£[exp (x - p) T A/] = exp |y (A r VA) , 

thus 

M s (0 = exp [(h t A) t + (A r VA) f 2 /2], (4.19) 

But from Eqn (4.6) this is the m.g.f. of a normal variate with mean 

n 

H = p r A = £ a t n t , 

i= 1 

and „ „ 

variance a 1 = A r VA = £ £ a i a jVtj- 
i =1 

Thus, by Theorem 3.2, S is distributed as iV(5, ji, a 2 ). 

An important special example of the multivariate normal distribution 
is the bivariate case, which occurs frequently in practice. The density func¬ 
tion is 

„(*, y; „„ „„ p) - n(x, y) - exp ]. 

(4.20) 
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and p is the correlation coefficient, as defined by Eqn (3.37). If the exponent 
in (4.20) is a constant (— K ), i.e. 

* = 2(1 - p 2 )K , 

then the points (x, j;) lie on an ellipse with centre ( p x , /^). In fact the density 
function (4.20) is a bell-shaped surface, and any plane parallel to the xy 
plane which cuts this surface will intersect it in an elliptical curve. Any 
plane perpendicular to the xy plane will cut the surface in a curve of the 
normal form. 

Just as for the univariate normal distribution we can define a standard 
bivariate normal density function 


n(u, v ) = 


1 

2n(l - p 2 f 


exp 


(u 2 — 2 puv + v 2 )' 

. -2(i - p 2 ) ; * 


(4.22) 


A feature of the bivariate normal distribution is that for p = 0 

n{u , t;) = «(w)n(i>), (4.23) 

which from Theorem 4.4 implies that u and v are independently distributed, 
a result which is not generally true for all bivariate density functions. 

Finally, the joint moment generating function for this distribution may 
be obtained as follows 


■^xy(jl9 i 2 ) — £[exp (t^X + 



exp (t ± x + t 2 y)f(x, y) dx dy . 


(4.24) 


Using the same technique which we used for obtaining the m.g.f. for the 
univariate normal distribution we set 


Then 


u — 






M xy (t u t 2 ) 


exp + t 2 fi y ) C( c(tiaxU+tiavV) [u 2 - 2puv + v 2 1 

2w(l — p 2 )* JJ 6 eXP L" -2(1 - p 2 ) J 


dudv 9 
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and substituting 

u - pv - (1 - 

M '~ (i-/> 2 )* 

Z = V - pt 1 <T x - t 2 <J y , 


we have 


M xy (t { ,t 2 ) = exp + f 2 ^ y + i(*i 2 ** 2 + 2 pt x t 2 c x a y + t 2 2 a y 2 )]. (4.25) 


The moments may be obtained in the usual way by evaluating the derivatives 
of Eqn (4.25) at t x = t 2 = 0. Thus, e.g. 


£[* 2 ] = 


PMxyVu h) 
dt 2 


fl=f2 = 0 


= °x 2 + Px 


4.4 Cauchy distribution 

The Cauchy distribution is a simple form occurring frequently in practice 
(i.e. the shape of atomic spectral lines) but possessing properties which 
serve as a useful reminder that not all distributions are well-behaved. It is 
defined as follows. 


Definition 4.4. The Cauchy density function for a random variable x is 

1 1 


Xx;0) = 


7T ' 1 + (* ~ 0) 2 ’ 


— OO < X < 00. 


The parameter 0 can be interpreted as the mean fi of the distribution 
only if the definition of the mean is extended as follows, 


. lin, r 

N-*coJ-N 


dxf(x; 6)x. 


This is somewhat questionable and we will, in general, set 0 = 0. Then the 
distribution function becomes 

F (x) s= -j- + — arctan (x). 

2 n 

The moment about the mean (zero) of order In is 

1 C 00 x^ n 

(426) 
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but this integral converges only for n — 0, and so only the trivial moment 
p 0 = 1 exists. Finally, it can be shown that the ratio of two standardized 
normal deviates has a Cauchy density function. This is one reason why it 
is encountered in practice. 


4.5 Binomial distribution 

Consider a population of members each of which either possesses a certain 
attribute P, or does not possess this attribute. Denote the latter possibility 
by Q. If the proportion of members possessing P is p , and that possessing 
Q is q , then clearly (p + q) = 1. An experiment involving such a population 
is called a Bernoulli trial , i.e. one with only two possible outcomes. Suppose 
now we wish to choose sets from the population, each of which contains n 
members. The proportion of cases containing rP’s and (n — r)Q 9 s will be 

( n r )pY"> (4.27) 

i.e. the rth term in the binomial expansion of 

f(p, q) = (q + pY- (4.28) 

Expressed in another way, if p is the chance of an event happening in a 
single trial, then for n independent trials the terms in the expansion 

f(p,q) = <f + nq"- x p + ... + p\ 

give the chances of 0, 1, 2, ..., n events happening. Thus we are led to the 
following definition. 

Definition 4.5. The probability density of the binomial distribution is 
defined as 


f(np,q)=( n r }p r q n -\ (4.29) 

and gives the probability of obtaining r = 0, 1, 2, ..., n successes, i.e. events 
having the attribute P, in an experiment consisting of n Bernoulli trials 
(i.e. p + q = 1). 

Tables of the binomial distribution function are given in Appendix D. 
The moment generating function may be found directly from Eqn (4.29) 
and the definition (3.23), which for the discrete case becomes 

M r (t)= Y.f(r-,p,q)e tr . 

r = 0 



4.5 


BINOMIAL DISTRIBUTION 


37 


Using (4.29) we have 

M r (t)= i( n r )pY~ r e ,r 

= e‘ + qf, 

and hence, using Eqn (3.25), we have 


and 


Px = H = nP> 

Pi =np + n(n - 1 )p 2 , 

o 2 = Pi ~ (Pi) 2 = npq. 


The m.g.f. for moments about the mean is 


and gives 


MJJ) = e“^M( 0, 


Pi = npq{q - p ), 

Pa = npq[l + 3 (n - 2)pq~]. 

Using (4.34) in Eqns (3.9) and (3.10) gives 


Pi = (q - P) 2 l(npq), 

P 2 = 3 + (1 - 6 pq)l(npq). 


(4.30) 


(4.31) 

(4.32) 

(4.33) 

(4.34) 

(4.35) 


which tend to the values for the normal distribution as n oo. It is, in fact, 
of interest to consider the limiting form of the binomial distribution as 
« oo, and this is provided by the following theorem. 


Theorem 4.6. The limiting form of the binomial distribution as n oo is the 

standard form of the normal distribution. 

Proof. The characteristic function of the binomial distribution is, from 
Eqns (3.27) and (4.30), 

<t> r (0 = (q+pz i 7. (4.36) 

Now any distribution may be expressed in standard measure (i.e. with 
p = 0 and a 2 = 1) by the transformation 


x = (r- i u)/cr. 


(4.37) 
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4>r(t) = I dxf(x) exp[it(ax + /*)] 
= exp {itn)4> x (ot). 

From Eqns (4.31) and (4.32) 

H — np and cr 2 = npq. 
Thus, using Eqn (4.36) we have 


giving 

where we have used the relation p + q = 1 in the log term. If we now let 
n -» oo, keeping t finite, we may expand the log term and we find 

In 0,(0 = - t 2 /2 + 0(f 3 >r±). 

Thus, for any finite t 


0(r) -► exp(— t 2 ll). 

However, this is the form of the c.f. of a standardized normal distribution 
\_cf. Eqn (4.9)], and, by the Inversion Theorem, the associated density function 
is 


fix) = exp(— X 2 I2), (4.38) 

which is the standardized form of the normal distribution. 

The normal approximation to the binomial distribution is reasonable even 
down to values of n (V 8. 


4.6 Multinomial distribution 

The multinomial distribution is the generalization of the binomial distri¬ 
bution to the case of repeated trials where there are more than two possible 
outcomes. It is defined as follows 
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Definition 4.6. If an event may occur with k possible outcomes each with 
a probability p t (/ = 1, 2, ..., k ), 

I Pt = 1, (4-39) 

i=l 

and if r f is the number of times the outcome associated with p t occurs, then 
the density function for the random variables (i = 1, 2, — 1) is the 

multinomial and is defined as 


f(r 1 ,r 2 ,...,r k . i ) = -^-UPr, r t = 0, 1 ,n . (4.40) 

nn! i=i 

i = 1 

Note that each of the r t may range from 0 to n inclusive, and that only (k — 1) 
variables are involved because of the linear constraint 

k 

i r i =n - 

i= i 

With suitable generalizations the results of Section 4.5 may be extended 
to the multinomial distribution, and, in particular, this distribution tends, 
in the limit, to the multivariate normal distribution. 


4.7 Poisson distribution 

The Poisson distribution is an important distribution occurring frequently 
in practice which is derived from the binomial distribution by a special 
limiting process. Consider the binomial distribution for the case when p, 
the proportion of the population possessing the attribute P, is very small 
but n , the number of members of a given set, is large such that 

lim (np) = r, (4.41) 

p-+ o 

where r is a finite positive constant, i.e. where 

n> np > p. 

The k\h term in the binomial distribution then becomes 

This is the density function of the Poisson distribution. 
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Definition 4.7. The density function of the Poisson distribution is 


f(k;r) = 


e~ r r k 

TP 


r > 0, k = 0, 1, ... 


(4.42) 


and gives the probability for different events when the chance of an event is 
small but the total number of trials is large. Although, in principle, the 
number of values of k is infinite the rapid convergence of successive terms 
in (4.42) means that, in practice, the distribution function is accurately 
given by the first few terms. Tables of the Poisson distribution function are 
given in Appendix D. 

The m.g.f. for the Poisson distribution is 


M k (t) = £[e*'] 

y e k, e~'r k _ r * (re')* 
~ k=o k\ ~ C k %~kT 
= e -r exp[>e*]. 

Differentiating (4.43) and setting t = 0 gives 


(4.43) 


Pi = r 
li 2 ' = r(r + 1) 

H 3 ' = 'TO + l) 2 + r] (4.44) 

= r[r 3 + 6r 2 + 7r + 1], 

and from (3.7) 


Pi — r 

Pi = r (4.45) 

jU 4 = r(3r + 1). 

Thus, 


H = a 2 = r, (4.46) 

a simple result which is very useful in practice. Also from (4.45), (3.9) and 
(3.10), we have 



P 2 = 3 + ~. 
r 


(4.47) 
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From these results on the skewness parameters one might suspect that as 
r -> oo the Poisson distribution tends to the normal, and indeed this is the 
case. 

Theorem 4.7. The limiting form of the Poisson distribution as r -> oo is 
the standard form of the normal distribution. 

Proof The characteristic function of the Poisson distribution is, from 
Eqns (3.26) and (4.43), 

<f) k ( t) = e" r exp(re* r )- 

If we now transform the distribution to standard measure by the relation 

x = (k - ii)lo , 

then 

4> k {t) =| dx f(x) exp lit(<rx 4- /x)]> 

J — 00 

= e^OO- 

From Eqn (4.46) we have seen that 

H— a 2 — r, 

and so 

= e~ itri e~' exp[re itri ], 

and 

In <f> x (t) = - itr* - r + rexp((7r _i ). 

If we now let r —♦ oo, keeping t finite, we may expand the exponential and 
we find 

lrut> x (t) = - f 2 /2 + 

Thus, for any finite t 

00) -> exp(- t 2 l 2). 

which is the form of the c.f. of a standardized normal distribution [c/. Eqn 
(4.9)], and so, by the Inversion Theorem, the associated density function is 
the standardized form of the normal distribution. 

The rate of convergence to normality is the same as for the binomial 
distribution \cf. Theorem 4.6], and so, in particular, the normal approxi¬ 
mations to the Poisson distribution is quite adequate for values of r ^ 8. 



5 Sampling 


5.1 Basic ideas 

Since, in practice, we only have access to a sample of the whole population 
of events we have to consider carefully how best to choose a method of 
characterizing the sample such that our conclusions regarding the population 
remain relatively stable from one sample to the next. Clearly the measure 
selected should correspond to a parameter which varies little from sample to 
sample. In this section we will consider the desirable properties of such 
samples. Firstly, some definitions will be necessary. 

Definition 5.1. If x u x 2 , ..., x n denotes a set of numerical values of n 
observations selected from a larger set then the set of values is called a 
sample of size n. 

Definition 5.2. A numerical value determined from some, or all, of the 
values of a sample is called a statistic. 

Just as in Chapter 3 we defined certain useful population parameters we 
shall now define similar quantities to describe the corresponding sample 
statistic. It is conventional to use Greek letters for population parameters 
and Roman for sample parameters. 

Definition 5.3. The sample mean of a sample of size n is defined by 

(5.i) 

n i = i 

Definition 5.4. The sample variance of a sample of size n is defined by 

* 2 = £ (Xi ~ (5 - 2 ) 
n 1 i — l 

and, similarly, s is the sample standard deviation. 

The sample variance has been defined in a way not exactly analogous to 
the definition of the population variance, Eqn (3.3). 

Here we have replaced the factor 1 /N of Eqn (3.3) by l/(n — 1). As will 
become clear in Chapter 7 this is to ensure that the expected value of all 
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statistics of a given kind computed from samples of size n be equal to the 
corresponding population parameter. 

We must also discuss how one can obtain the distribution of a sample 
statistic. A formal solution to this problem is as follows. Let x l9 ..., x n be a 
random sample of size n from a density f(x). We wish to find the distribution 
of a sample statistic y(x u ..., x„). The distribution function of y is given by 

F(y) = J... J f(x t ) dx t , 

where the integral is taken over the region such that y > j(x 1; .... x„). In 
practice, it is often convenient to let y(x t , ...,x„) be a new variable and then 
choose n — 1 other variables (functions of x ( ) such that the w-dimensional 
integrand above takes a simple form. We will illustrate this by an example. 

Example 5.1. We shall find the sampling distribution of the mean x„ of a 
sample of size n drawn from the Cauchy distribution of Section 4.4, i.e. 

/(*) = —• —- co < x < co. 

J v it 1+/ 

If we choose new variables u t — x f (i = 1— 1) and let u„ = x n then 
the Jacobian of the transformation is 

j d(*i,*2, —>x«) 

d(x n , Mj, t/2> ^n— l) 

and the distribution function becomes 

f(xj ...f(x„) J dx n ' dx ( , 

1=1 

taken over the region such that x n f ^ Thus 



F(x„) = duA ... I —] n r 7 »-i \ an > 

J-co J-oo J- 00 V / j—1 (1 _|_ w . 2 ) l + ^ H . j 


and the density function of x n is given by the (n - l)-fold integration in 
Uj (j = 1 ,n — 1). This integral can be evaluated but the algebra is rather 
lengthy. The result is 


/(*») =—• 
71 


1 

1 +x„ 2 ’ 


which is the same form as the population density. 


(5.3) 
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Another useful method of finding sampling distributions is to find the m.g.f., 
or c.f., for the statistic, and then use either Theorem 3.2, or the Inversion 
Theorem to identify its density function. This technique is very practical 
and we shall have occasion to use it later. 


5.2 Sampling distributions: theorems 

In this subsection we will consider a number of definitions and theorems 
relating to sampling distributions in general. These will be useful when we 
consider more practical distributions in Chapter 6. 

Definition 5.5. Let S denote a sample of n observations x t (i = 1, 2, ..., n) 
selected at random. The sample S is called a random sample with replacement 
(or a simple random sample) if, in general, the observation x n - ± is returned 
to the population before is selected. If is not so returned then S is 
called a random sample without replacement . 

Sampling with replacement implies, of course, that it is indeed possible to 
return the “observation” to the population, as for example is the case when 
drawing cards from a deck. In most practical situations this is not possible 
and the sampling is without replacement. 

The following theorems hold for S. 


Theorem 5.1. Let N denote the size of any finite population and n the size of 
a sample without replacement , then for all possible samples of size n the mean 
of the means is equal to the population mean p, and the variance of the means 
°x 2 is equal to the variance of the population a 2 multiplied by (N—n)/[n(N— 1)], 
i.e. 


fix = 


(5.4) 



(5.5) 


If the selection is with replacement then 


Vx = V, 



(5.6) 

(5.7) 


If we consider, instead of a discrete finite population, a discrete but 
infinite one then it is clear that sampling with and without replacement lose 
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their distinction and Eqns (5.6) and (5.7) hold. For continuous infinite 
populations we are led to the following theorem. 


Theorem 5.2. Let x be a continuous random variable distributed with mean 
fi, variance a 2 and density function fix). Let random samples of size n be 
drawn from this distribution. Then the sampling distribution of means has 
mean p* equal to the population mean, and variance of equal to the population 
variance o 2 times a factor l In, i.e. 


Ms = M> 



( 5 . 8 ) 

(5.9) 


As an example of the proofs of these theorems we shall prove Eqn (5.9). 


Proof. By definition 

<x* 2 = E[(x - Era) 2 ], 

- £[(x - n ) 2 ], 


which may be written 


'*■-?*[(£*■-'■)’]• (5 - 10) 

If we expand the square on the right hand side there are n terms of the form 
(x t — p) 2 which gives contributions 

f fa - p) 2 f{Xi) dx t = a 2 . (5.11) 

J — 00 

The remaining terms are of the form (x t — p)(xj — p) with i < j and 
contribute terms 


jj (x, - n)(Xj - It)f(xdf(xj)dx,dxj 

= I (x, - n)f(x t ) dx, j (x,. - n)f(Xj) dxj 
J- 00 J-OO 


= 0. 


(5.12) 
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1 V 2 ° 2 

ZTLo= — 


(5.13) 


The above theorems are of considerable importance because they show 
that as the sample size increases the variance of the sample mean decreases, 
and thus the probability that the sample mean is a good estimation of the 
population mean increases. This result may be stated formally as the Weak 
Law of Large Numbers. 


Weak law of large numbers 

Let x t be a population of independent random variables with mean p. and 
finite variance. Let x n be the mean of a sample of size n 

1 » 

*» = — I *i- 

n i 

Then, given any e > 0 and 5 in the range of 0 < 6 < 1, there exists an integer 
n such that for all m ^ n 


^[| x m - n | < e] 1 — 8. (5.14) 

The Weak Law of Large Numbers tell us that | x n — ft j will ultimately be 
very small but does not exclude the possibility that for some finite n it could 
be large. Since, in practice, we can only have access to finite samples this 
possibility could be of some importance. Fortunately there exists the so- 
called Strong Law of Large Numbers, which, in effect, states that the proba¬ 
bility of such an occurrence is extremely small. It is the Laws of Large 
Numbers which ensure that the empirical definition of probability we 
have adopted concurs in practice with the axiomatic one. 

The Weak Law of Large Numbers is a special case of Tchebysheff’s 
Inequality which may be stated as follows: 

Tchebysheff’s inequality 

Let f(x) be a density function with mean ft and finite variance a 2 . Let 
p be any positive number, and let x n be the mean of a random sample of size 
n drawn from / (x). Then 
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Proof. Let the density function of x n be g(x n ). Then from Theorem 5.2 we 
have 

<x s 2 = — = f (x„ - ju) 2 g(x„) dx„, (5.16) 

n J - oo 

fH-ipafn*) rp + iptrjn^) 

(x n - fi) 2 g(x n ) dx n + (*„ - fi) 2 g(x n ) dx n 

J- 00 Jp-(p<r/n*) 

+ T (x„ - fi) 2 g(x n )dx n . (5.17) 

Jj» + (p<r/na) 


Now if we replace (x„ — g) 2 by p 2 o 2 jn in the first integral the value of this 
integral will clearly not decrease. The same argument holds for the third 
integral, and the second integral is nonnegative, so we may obtain from 
(5.17) the inequality 



n 


i 2 (j 2 ( _ f 00 } 

J I g(x n ) dx n + I d(Xn) dx n \, 

n \ J — oo J/i + (p<r/n*) * 


(5.18) 


which, from the definition of a distribution function, is just the statement 
that 


or 

(5.19) 

which completes the proof. 

The Weak Law of Large Numbers may be proved from Tchebysheff’s 
inequality provided the population distribution has a finite variance. We 
merely choose p = <5“* and n > o 2 ISe 2 , and substitute in Eqn (5.19). The 
bound given by (5.15) is usually weak, but if we restrict ourselves to the 
sampling distribution of the mean then we can state the most important 
theorem in statistics, the Central Limit Theorem. 


Theorem 5.3 (Central Limit Theorem). Let the independent random 
variables x h of unknown density function , be identically distributed with mean 
fi and variance a 2 , both of which are finite. Then the distribution of the sample 
mean x n tends to the normal distribution with mean p. and variance o 2 jn when 
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n becomes large. Thus , if u{t) is the standard form of the normal density 
function , then for arbitrary t 1 and t 2 

jr* 0 *- <s - 2o) 

Proof. By applying the results on expected values given previously in 
Eqns (3.21) and (3.22) to m.g.f.’s it follows immediately that if the com¬ 
ponents of the sample are independent then the mean and variance of their 

n 

sum S = Yj are given by 

i— 1 


fi s = nfi and o s 2 = no 2 . 

Now consider the variable 

S — pi s 1 

u - ~ ~ & (x ‘- ^ (5 - 21) 
with c.f. $„(/). If ^((f) is the c.f. of (x, - /i), then 

but all the (x t — p) have the same distribution and so 

< 5 - 22 > 

Just as the m.g.f. can be expanded into an infinite series of moments so we 
can expand the c.f. Thus 

00 (itY 

4>(t) = 1 + £ (5.23) 

r—i r! 

and since the first two moments of (x t — ji) are zero and o 2 we have from 
(5.22) and (5.23) 

Thus, for fixed t, as n -> oo 


-*■ e ,2li . 


(5.24) 
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which is the c.f. of a standardized normal distribution. Thus by the Inversion 
Theorem S is distributed as N(S;n s ,cr 2 ) and hence x„ is distributed as 
N(x„; n, a 2 In). 

The form of the Central Limit Theorem above is not the most general 
that can be given. For example, provided certain (weak) conditions on the 
third moments are obeyed then the condition that the x f all have the same 
distribution can be relaxed, and it is possible to prove that the sampling 
distribution of any linear combination of independent random variables 
having arbitrary distributions but with finite means and variances tends to 
normality for large samples. There are also circumstances under which the 
assumption of independence can be relaxed. 

The Central Limit Theorem applies to both discrete and continuous 
distributions and is a most remarkable theorem because nothing is said about 
the original density function, except that it have finite mean and variance, 
which in practice are seldom restrictions. However, this condition is essential. 
Thus we have seen in Example 5.1 that for the Cauchy distribution the 
distribution of x n is 


/(« = -• 
71 


1 

1 + *„ 2 ’ 


i.e. the same as for a single observation. The failure of the Central Limit 
Theorem in this case can be traced to the infinite variance of the Cauchy 
distribution. It is the Central Limit Theorem which gives the normal distri¬ 
bution such a prominent position both theoretically and in practice. In 
particular, it allows (approximate) quantitative probability statements to be 
made in experimental situations where the exact form of the underlying 
distribution is unknown. 

Just as we have been considering the sampling distribution of means we 
can also consider the sampling distribution of sums T — Ex* of random 
samples of size n . If the random variable x is distributed with mean fi, and 
variance a 2 , then the sampling distribution of T has mean 


Mr = 


(5.25) 


and variance 


/ 



V 


for sampling from a finite 
population of size N without 
replacement 


otherwise. 


(5.26) 
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We shall conclude this section with a few results on the properties of 
linear combinations of means, since up to now we have been concerned mainly 
with sampling distributions of a single sample mean. 

Theorem 5.4. Let 


l = £ a t x b 


1=1 


(5.27) 


where a t are real constants, and the x t are random variables with mean fi h 
variances a? and covariances o tj ( i,j = 1,2,..., n; i # j). Then 


and 


Hi = £ a tHi, 


a i 2 = £ a t 2ff i 2 + 2 £ apjOij, 
i =1 i<j 


(5.28) 


(5.29) 


= £ a^Gi 2 , if the x?s are mutually independent . (5.30) 


i— 1 


Proof Let f(x l9 x 2 , ..., x n ) be the joint density function of x u x 2 , ..., x n . 
Then using Eqns (3.33) 


Pi 


rco roo n 

= ••• lf(xi>x 2 ,... 9 x n ) n dx i9 

J - CO J - CO i- 1 

roo j-oo / n \ n 

= J-oo 'J-oo \,?i ° jXj ) f( - XuXi ’ Xn) n dx » 

n rco /* co „ 

= l a j\ • x jf( x i,x 2, n dx o 

j=l J - CO J - CO i*= l 


= £ a jHy 

j~ 1 


Also, using Eqns (3.34) and (3.36) 

ff i 2 = f ••• f (l- Hi) 2 fix i, x 2 > —> x n) ]j dx t 
J - co J — co j = l 

f 00 /'oo r n *|2 „ 

I “* I ^ afXj A*j) /(*1>*2, •■•5^n) 0 dXi 

J — co J— oo Lj = l J j=l 
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n f* oo /*oo n 

= Z a y 2 I ••• (*/ ~ Hj) 2 f(x u x 2 , n dx t 

j= 1 J - co J - oo i = 1 

/* 00 /*00 II 

+ 2 z a k a j I ••• (** ~ Hk)(,Xj - Vj)f(x u x 2 ,x n ) n dx t 

k< j J — co J — co i= 1 

n 

= Z a i°i + 2 Z a k a j a kj- 

i~ 1 /c < j 

which completes the proof. 

A useful corollary to the above theorem is as follows. Let x t (i = 1,2) be 
the mean of a random sample of size n t drawn from an infinite population 
with mean and variance If x x and x 2 are independently distributed, 
then 

= /<i ± /'2> (5.31) 

and 

<532) 

These results follow immediately from Eqns (5.28) and (5.29), and Theorem 
5.2, by the substitution x x = X! and x 2 = x 2 with a x = a 2 = 1 for the first 
case and a x = — a 2 = 1 for the second. 

5.3 Experimental errors and their propagation 

In the preceding sections we have been concerned with theoretical statistics 
only. In this subsection we will provide the link between theoretical statistics 
and experimental situations. 

In an experimental observation one can never measure the value of a 
quantity with absolute precision, that is one can never reduce the error on 
the measurement to zero. The precision of a measurement will be taken to 
mean the smallness of the error. By accuracy we shall mean the deviation of 
the observation from the “true” value, assuming that such a concept is 
meaningful. Thus there may exist, in addition to fluctuations in the measure¬ 
ment process which limit the precision, unknown systematic errors which 
limit the accuracy, In general, the only errors that we can deal with here are 
the former type, and the conventional measure of this type of error is taken 
to be the standard deviation a. In this case, it is called the standard error. 
This definition of the error is, of course, arbitrary, and some workers still 
use the older probable error p which is defined by 

(*H + P 

/(*) dx = }. 

Jn-p 
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Unfortunately some authors do not say which form of error they are 
using, or even multiply their errors by an arbitrary factor “to be on the safe 
side” when quoting results. Needless to say such practices render statistical 
analyses meaningless and are to be discouraged. 

Consider, for example, an idealized nuclear counting experiment for a 
scattering process. The number of trials, i.e. the number of incoming particles, 
is very large, but the probability of a scatter, p is very small. In this situation 
the Poisson distribution (Section 4.7) is applicable, and as we have seen in 
Eqn (4.46), if N e = np is the total number of counts recorded then a — (N e )* 
The result of the experiment would be given as 

where N = N e ± AN, (5.33) 

AN = (AQ* (5.34) 

If the population distribution is unknown then we can consider the 
sampling distribution. For example, from a set of observations x t we know 
that an estimate of the mean is the sample mean 

* = — £ Xi, (5.35) 

n i=i 

and the Laws of Large Numbers ensure that x is a good estimate for large 
n . The variance of x is 



so to calculate o** 2 we need to estimate <j 2 . We have seen that the sample 
variance is 

* 2 = t (*f - x) 2 , (5.37) 

n — l i=i 

and thus 


1 


== 


E (x,- - x) 2 . 


n(n - 1) ,ti 
The experimental result would then be quoted as 

x = x e ± Ax, 

where 


(5.38) 


(5.39) 



5.3 EXPERIMENTAL ERRORS AND THEIR PROPAGATION 53 

Now by the Central Limit Theorem we know that the distribution of the 
sample means is approximately normal, and so Eqn (5.39) may be interpreted 
(c/. Section 4.2) as 


P[_x e — Ax < x < x e + Ax] ca 68-3% 

P[x e ~ 2Ax < x < x e + 2Ax] a* 954% 

P[x e - 3Ax < x < x e + 3Ax] « 99-7%. 

Thus, even though the form of the underlying distribution of x is unknown, 
the Central Limit Theorem has enabled us to make an approximate quanti¬ 
tative statement about the probability of the true value of x lying within a 
specified range. 


5.3.1. Propagation of Errors 
If we have a function y of the p parameters 9 t (i = 1,...,/?) i.e. 
y = y(6) = y(0 u 9 2 , 

then we are often interested in knowing the approximate error on y, given 
that we know the errors on 0 f . If the true values of 9 t are 0 f * (in practice 
estimates of these quantities would usually have to be used), and the quanti¬ 
ties (fit — 0j*) are small, then a Taylor expansion of y(9) about the point 
0 = 9* gives, to first order 


p dy(0 ) 

y(0) = y(0*) + £ 

i — 1 OVi 

Now 

var y(0) = £[(**) - £|>(0)]) 2 ] 

« EL(y(0) - y(0*)) 2 l 


(5.41) 


(5.42) 


Using (5.41) in (5.42) gives 


var y(0) ^ X Z 

i=i j =1 



3y(6) 

dOj 


EL^ - - 0j*)1. (5.43) 

0 = 0 * 


Now, from Eqn (3.36) we see that 


v,j = Em - eniOj - 0 /)], 
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is the matrix of variances and covariances of the parameters (the variance 
matrix ). Thus, if we set 

(Ay) 2 = varj. 


we have 


(Aj) 2 =£l 

i=lJ=1 


d9j 1 0 = 0 * 


Equation (5.44) is often referred to as the Law of Propagation of Errors. If 
the errors are uncorrelated (i.e. cov (0 f , Of) = 0) then 



(A0,) 2 , i=j 

i ^ j‘ 9 


and (5.44) reduces to 




(5.45) 


When using these expressions one should always ensure that the quantities 
A0* = — 0 f * are small enough to justify truncation of the Taylor series 
(5.41) at the first order in A0 f . 



6 Sampling Distributions Associated with the Normal 

The special position held by the normal distribution, mainly by virtue of 
the Central Limit Theorem, is reflected in the prominent positions of certain 
related distributions. In this chapter we will consider the basic properties of 
three frequently used distributions; the Chi-square, the Student t and F 
distributions. These arise when sampling from normal populations, and are 
widely used in estimation problems (i.e. finding the best values of parameters), 
and in the testing of hypotheses. These topics will be discussed in detail in 
Chapters 7-11. 


6.1 Chi-square distribution 

If we wish to concentrate on a measure to describe the dispersion of a 
population then we consider the sample variance. The chi-square distribution 
is introduced for problems involving this quantity. 

Theorem 6. 1. If x t (i = 1,2,..., v) is a sample of v random variables normally 
and independently distributed with means p t and variances of, then the 
statistic 



is distributed with density function 

fix 2 ;v) = - /2r l (v ^y x 2[(v/2) " 11 exp (-x 2 /2), xf > 0, (6.2) 

mean v and variance 2v. 

This is known as the ^-distribution (chi-square) with v degrees of freedom. 
(It is perhaps unfortunate that the symbol y 2 should have been chosen for 
both the statistic and the distribution). The symbol T(x) in Eqn (6.2) is the 
Gamma function, defined by the integral 

T(x) = J dwe~ M w* _1 , 0 < x < oo (6.3) 
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and is frequently encountered in sampling distributions associated with the 
normal distribution. To prove this theorem we shall again use the method of 
characteristic functions. 


Proof. Let 



Then the z t are distributed as N(z t ; 0,1), and from Example 3.2 we know 
that the quantities u t = zf have density functions 


Thus the c.f. of « ( is 

= (1-2 ity 1 ' 2 , (w f > 0). 

Now since the random variables w f are independently distributed, if <f>(t) is 
the c.f. of x 2 , then 

4>(t) = £ = (1 - 2 it)-' 112 . (6.4) 

i — 1 

The density function of y 2 is now obtainable from the Inversion Theorem. 
It is 


AX 2 iv) 


— dt(\ — 2it)~~ y 12 e“ ,z2f 
271 Jo 


This integral may be evaluated using the definition of T(x) in Eqn (6.3) and 
gives 


fix 2 ; v) = yft z 2C(v/2) 1J exp (-x 2 /2), 
which is the required result. 

The m.g.f. is obtainable directly from Eqns (6.4) and (3.23), and is 


M(t) = (1 - 

It follows that 

H = v and a 2 = 2v, 


which completes the proof. 
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Graphs of /(x 2 ;v) and its distribution function F(x 2 ',v) for v = 1,4 and 
10 are shown in Fig. (6.1), and tables of F(* 2 ; v) are given in Appendix D. 

The third and fourth moments about the mean may also be found from 
the m.g.f. M(t). They are 


giving 


fi 3 = 8v; /x 4 = 12v(v + 4), 

0,-i; fc-3(l+4). 


which tend to the values for the normal distribution as v -> oo. The % 2 
distribution does indeed tend to normality for large samples and we can 
demonstrate this by constructing the c.f. for the standardized variable 




Fig. 6.1. The chi-square density/Ge 2 ; v), and its distribution function F(x 2 ; v), for v=l, 
4 and 10. 
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From (6.4) it is 

4> y (t ) = exp 


-ivf ] f 

2 it 

(2v) 1/2 J [ 

(2v) 1/2 J 


" v/2 


and hence, taking logarithms and letting v -> oo gives 


(v -» oo), 


i.e. 


</>„(/)-» exp (-/ 2 /2), 


which is the c.f. of the standardized normal distribution. Thus, by the 
Inversion Theorem, the yf distribution tends to normality as v -> oo, although 
the rate of convergence is rather slow. 

Because the yf distribution is a one-parameter family it frequently happens 
that tabulated values do not exist for precisely the range one requires. In 
such cases a very useful statistic is (2* 2 ) 1/2 which can be shown to tend very 
rapidly to normality with mean p = (2v - 1) 1/2 and unit variance. Thus the 
statistic 

u = (2y 2 ) 1/2 - (2v - l) 1 ' 2 , 


for even quite moderate v is a standard normal deviate, and so tables of the 
normal distribution may be used. 

Table (6.1) shows a comparison between the exact distribution function 
and the normal approximation based on the statistic (2% 2 ) l/2 for a range of 
values, of v and x 2 - 

It follows directly from the definition of x 2 that the sum of n independent 
random variables x 2 , 12 , •••, X„ 2 each having chi-square distributions with 
v u v 2 , ..., v„ degrees of freedom, respectively, is itself distributed as y 2 with 
v = v t + v 2 + ... + v„ degrees of freedom. We shall refer to this as the 
additive property of y 2 . 

Another important result which we shall need later is contained in the 
following theorem. 

Theorem 6.2. Let x^,X 2 > •••,x v be a sample of size v drawn from a normal 
population with mean zero and unit variance. Then the statistic 


u = Z (*i - *) 2 > 

1 s= 1 


s distributed as y 2 with (v - 1) degrees of freedom. 




6.1 CHI-SQUARE DISTRIBUTION 

Table 6.1 Values of P[/ 2 > x a 2 ] for v = 5,10, and 20 and x« 2 = 2, 5, 
using the exact xf distribution function, and the normal approximation. 


v 


5 10 20 



exact 

approx. 

exact 

approx. 

exact 

approx. 

2 

0-849 

0-841 

0-996 

0-991 



5 

0-416 

0-436 

0-891 

0-885 



10 

0-075 

0-071 

0-441 

0-456 

0-968 

0-963 

20 

0-001 

0 001 

0-029 

0*024 

0-458 

0-462 

30 



0-001 

0-000 

0-070 

0-067 


Proof. Consider the transformation of variables defined by 
Wi = (Xj — x 2 )/V2> 
u 2 = (*! + x 2 - 2x 3 )/V6, 


«v-l = + X 2 + ... + X v _! - (V - 1) X V )/V|>(V - 1)], 

M v = (*1 + X 2 + ... + X v )/yjv. 
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10,20, 30 


It can be easily verified that if the x t are independently normally distributed 
with mean zero and unit variance then so are the variables u t . Now consider 
the statistic 
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Thus the sum of squares of v standard normal variates measured from their 
mean is distributed as the sum of (v — 1) normal variates with mean zero. 
It follows from Theorem 6.1 that the statistic 

“ = E (*i ~ x) 2 , 

i= 1 

is distributed as x 2 with (v — 1) degrees of freedom. 

In general, if the parent population has variance a 2 , then 

x 2 = \ i (*,• - x) 2 , 

cr i= i 

is distributed as x 2 with (v - 1) degrees of freedom. Moreover, since the 
sample variance is 


it follows that (v — l)^ 2 /cr 2 is distributed as x 2 with (v — 1) degrees of 
freedom, independent of the sample mean x. Thus we have also proved the 
following theorem. 

Theorem 6.3. The sample mean and sample variance are independent 
random variables when sampling randomly from normal populations . 

This somewhat surprising result is very important in practice and we will 
use it later to construct the sampling distribution known as the Student t 
distribution. 

If we assume that a sample is drawn at random from a single normal 
population with mean p and variance a 2 then from (6.1) 

x 2 =- 4 - i (x t - n) 2 . 

O 1 = 1 


However, since the mean of the population is rarely known, in these cases 
it is more useful to use the results of Theorem 6.2. In that case x 2 is distributed 
with (v — 1) degrees of freedom if x is used instead of p. In general , the num¬ 
ber of degrees of freedom must be reduced by one for each parameter estimated 
from the data . 

The x 2 distribution is one of the most important sampling distributions 
occurring in physics. The density function is a family of curves, but a useful 
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table may be constructed by calculating the proportion a of the area under 
the x 2 curves to the right of the point / a 2 , i.e. points such that 

P[X 2 > X« 2 ] = « = f fix 2 ; v) dx 2 . 

Jx « 2 

Such points are called percentage points of the ^-distribution and may be 
deduced from the tables in Appendix D. They are also shown graphically in 
Fig. (6.2). A point of interest about these curves is that for a fixed value of 
P, % 2 ft 1 as v -► oo . 



Fig. 6.2. Percentage points of the chi-square distribution. P = P[% 2 ^ * a 2 ]. 


6.2 Student t distribution 

The Central Limit Theorem told us that the distribution of the sample 
mean x was approximately normal with mean p (the population mean) and 
variance o 2 \n (where a 2 is the population variance and n is the sample size). 
Thus, in standard measure, the statistic 
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is approximately normally distributed with mean 0 and unit variance for 
large n. However, in experimental situations neither the mean nor the 
population variance are known, and must be replaced by estimates calculated 
from the sample. One can safely replace <r 2 by the sample variance s 2 for 
large n > 50, but for small n the statistic will not be approximately normally 
distributed and serious loss of meaning in interpretation will occur. In such 
cases the student t distribution must be used. This distribution, which we will 
discuss in this section, enables one to use the sample variance, as well as the 
sample mean, to make statements about the population mean. We shall 
concentrate our results in four important theorems. 


Theorem 6.4. Let u have a normal distribution with mean zero and unit 
variance. Further, let w have a y 2 distribution with v degrees of freedom, and 
let u and (w) 1/2 be independently distributed. Then the random variable 


u 

t= (w/v) 1 ' 2 ’ 


has a density function 


f 0;v) 


r[(v + i>/2] r pii-(v+D /2 
(7tv) 1/2 T(v/2) L 1 + V J 


— 00 < t < 00, 


(6.5) 


with mean zero and variance v/(v - 2) for v > 2. The statistic t is said to have 
a Student t distribution, with v degrees offreedom. 


Proof. From Eqns (6.2) and (4.10) the joint density function of u and w is 


f{u, w; v) = 


If we now substitute 


then (6.6) becomes 


,~u 2 /2 


1 


(2;r) 1/2 T(v/2)2 V/2 


(W 

u= \V/ ’ 


W 


,(v-2)/2 e -«./2_ 


fit, w; v) = 


Q — t 2 w/2v e ~w/2 ^- 2)72 


and 


( 6 . 6 ) 


(6.7) 


(27t) 1/2 T(v/2)2 v/2 ’ 

fit', v) = fit, w; v) dw. 

This integral may be evaluated directly using Eqn (6.3) and gives Eqn (6.5) 
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To find the mean and variance we again use the method of the m.g.f. 
From (6.5) and (3.23) we see that moments of order r exist only for r < v, 
and are zero by symmetry for odd order moments. For even order moments 
direct integration gives 


= r r(r + l/2)r(v/2 - r) 

^ V F(l/2)F(v/2) 


2 r < v. 


The mean and variance may be obtained from (6.8). They are 


( 6 . 8 ) 




v 

v - 2* 


The second theorem specifies the distribution of the difference of the sample 
mean and the population mean with respect to the sample variance. 


Theorem 6.5. Let x t (i = 1,...,«) be a random sample of size n drawn 
from a normal population with mean p and variance <j 2 . Then the statistic 


where 


and 


t = (x - n), 


S 2 = -- X (*i - x) 2 , 

n — 1 i=i 


1 n 

X = — £ X h 

n ,• = i 


(6.9) 

( 6 . 10 ) 


is distributed as the Student t distribution with (n - 1) degrees of freedom. 


Proof. If the mean and variance of the population are /< and a 2 , respectively, 
then the statistic 


u - 



is distributed as N(u; 0,1). Furthermore, from Theorem 6.2 we know that 
the statistic 

s 2 

w = (n - 1) — , 

<7 
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is distributed as f 2 with n - 1 degrees of freedom. Therefore, from Theorem 
6.4 the statistic 


, u _V« . 

[>/(« _ l)]l/2 - s & W. 

is distributed as t with (n — 1) degrees of freedom. 

The third theorem concerns the asymptotic behaviour of the t distribution. 

Theorem 6.6. As the number of degrees of freedom of the Student t distrib¬ 
ution approaches infinity, the distribution tends to the normal distribution in 
standard form. 

Proof. If in Eqn (6.8) we use Stirling’s approximation in the gamma 
functions, i.e. 


T(v + 1) -+ (2n) 1/2 v v+1/2 e v , (v ->■ oo), (6.11) 

then <2r)l 

( 6 . 12 ) 

However, from (4.7) we know that this is the expression for the moments of 
the normal distribution expressed in standard measure. Thus, by Theorem 
3.2, the Student t distribution tends to a normal distribution with mean zero 
and unit variance. 

The final theorem concerns the t distribution when two normal populations 
are involved. 


Theorem 6.7. Let random samples x lu x 12 , ..., x lni , and x 2U x 22 , x 2ni 
of sizes n 1 and n 2 , respectively , be independently drawn from two normal 
populations 1 and 2 with means p 1 and p 2 , and the same variance cr 2 . Then, 
if we define 

1 »« 

x i = — Z x w * = 1 , 2 , 

n i j= 1 

the statistic 


t 


(^i xf) (Mi ^2) 




1 12 


(6.13) 
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where S p 2 , the pooled sample variance, is given by 


X X ( x u - x i) 2 


V = i=1J=1 


n 1 + n 2 — 2 


(6.14) 


Aas a Student t distribution with v = n 1 + n 2 — 2 degrees of freedom . 


Proof Equation (6.14) may be written 

s 2 = (”l ~ OV + (”2 ~ 0 J 2 2 
P + « 2 “ 2 

Thus, by Theorem 6.2 and the additive property of % 2 , the quantity 


w = 5 p 2 («i + n 2 - 2)/cr 2 , (6.15) 

is distributed as # 2 with (« A + w 2 — 2) degrees of freedom. Furthermore, we 
know, from Eqns (5.31) and (5.32), that x = — x 2 is normally distributed 

with mean n — and variance 


Thus the quantity 


(Xi - x 2 ) - (/^i - ti 2 ) x ~ n 



(6.16) 


is normally distribtued with mean zero and unit variance. Now, by Theorem 
6.3,3c t and s 2 (i — 1,2) are independent random variables, and hence x and 
H are independent random variables. Thus, by Theorem 6.4, the quantity 


t = ---77, , (6.17) 

[wj(n l + n 2 — 2)] 1/2 

has a t distribution with (« x + « 2 — 2) degrees of freedom. Substituting 
(6.15) and (6.16) into (6.17) gives (6.13) and completes the proof. 

Like the x 2 distribution the Student t distribution is also a one-parameter 
family of curves. Tables of percentage points are given in Appendix D, and 
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in using them one can use the fact that 

PU < -t a (v)] = Pit > r a (v)] = a, 

since the distribution is symmetric about t = 0. The percentage points are 
also shown graphically in Fig. (6.3). 



Fig. 6.3. Percentage points of the Student / distribution. P = P[t ^ /J. 


6.3 F distribution 

The F distribution is designed for use in situations where we wish to 
compare two variances, or more than two means, situations for which the 
X 2 and Student t distributions are not appropriate. 

We will begin by constructing the form of the F distribution. 

Theorem 6.8. Let the two independent random variables y 2 (/ = 1,2) be 
distributed as y 2 with v f degrees of freedom. Then the statistic 


(6.18) 
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is distributed with density function 




\ * / /v 1 \ Vl/2 f(vi-2)/2 

nF; Vu Vi) = r(v t /2)r(v 2 /2) w T \ ( vi+ v 2,/2 

( 1+ ^) 


, F>0, (6.19) 


and variance 


v 2 ~ 2 


v 2 > 2, 


2v 2 2 (v 1 + v 2 - 2) 
v i(v 2 — 2) 2 (v 2 — 4) 


v, > 4. 


Proof. Let 


u = Xi 2 ; v=X2 2 > 


then by Theorem 6.1 the joint density function of u and v is 


I/ (v 1 -2)/2 l) (vi-2)/2 


- r(v,/2)r(v 2 ;2)2<---'» exp[ ~ K " + " )] ' 


Substituting 




gives the joint density of F and v as 


f(F,v) = 


„(V2-2)/2 




x (^r% xp [_z (l+ ^)]. 


To obtain the density function of F we integrate out the dependence on v. 
Thus 


f(F;v i>v 2 ) = 


pin- 2)12 

T( v! /2) T(v 2 /2)2 ( v 1 + V2)/2 


/v, W 2 

M) i(F;v u v 2 ), 
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where 


I(F;v 





v (Vi+V2 2 )/2 q X p 




dv 


r[(Vi + v 2 )/2]2 (Vl+V2)/2 
(1 + v 1 F/v 2 ) (vi+V2)/2 ‘ 


Equation (6.19) follows directly. 

The m.g.f. may be deduced in the usual way. One finds that moments of 
order r exist only for 2 r < v 2 , and are given by 


/v 2 \ r r(r + v 1 /2)T(v 2 /2 — r) 

\vj F(v 1 /2)r(v a /2) 


( 6 . 20 ) 


The mean and variance follow directly. By using (6.20) to calculate it can 
be shown that the F distribution is always skewed. 

The density function of the F distribution is more complicated than that 
of either the y 2 or t distribution in being a two-parameter family of curves. 

Percentage points are defined in the same way as for the y 2 distribution. 
Thus 


PLF>FJ=«= f dFf(F;v u v 2 ). 

Right-tail percentage points may be obtained from the tables in Appendix 
D, and should the left-tail percentage points be needed they may be obtained 
from the relation 


The percentage points for P = 0.05 are also shown graphically in Fig. 6.4. 


6.4 Relation between y 2 , t and F distributions 

The F distribution is related in a simple way to the y 2 and Student t 
distributions, as follows. 

It is easy to show that as v oo, P[\X 2 I V ~ 1|] 0. Thus 

F(v u oo) = ^ . 

Vl 


( 6 . 21 ) 
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Thus, the distribution of / 1 2 /v 1 with v x degrees of freedom is a special case 
of the F distribution with v 2 and oo degrees of freedom. Thus for any a we 
have 

c / ^ Za 2 ( v l) ,, ~~ 

Fa(y u co) = -, (6.22) 

V 1 


which may be directly verified by the use of a set of tables. If we consider 
the limit as v x -> oo then we have 




(6.23) 

and 


(6.24) 


Thus the left-tail percentage points of the / 2 /v distribution are special cases 
of the right-tail percentage points of F(oo, v). 



Fig. 6.4. Percentage points of the F distribution for P = P[F > —* 0 05. 
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To relate the T-distribution to the Student t distribution we note that 
when v t = 1 , x 2 / v i — where u is a standard normal variate. Thus we may 
write 

<6 - 25) 

Now from Theorem 6.4 


/ = 


Gf2 2 /V 2 ) 1/2 ’ 


(6.26) 


is distributed as the Student t distribution with v 2 degrees of freedom, and 
so we may write (6.25) as 


^(1» v 2 ) = t 2 (v 2 ). 

Using (6.27) we may rewrite 

P[F(1, v 2 ) < F„( 1, v 2 )] = 1 - a 


(6.27) 


as 


P[-(W> v 2 ))^ 2 < t(v 2 ) < (F.( 1, v,)) 1 / 2 ] = 1 - a, 
and using the symmetry of the f distribution about / = 0we have 


i>D(v 2 ) < - (F„(l, v 2 )) 1/2 ] = P[f(v 2 ) > (F.(l, v 2 )) 1/2 ] = a/2. (6.28) 


But 


P[t(v 2 ) > 4 /2 (v 2 )] = a/2, 


and so 


4/ 2 (v 2 ) = [F a (l,v 2 )] 1 / 2 , 


or 


F«(hV2) = t» / 2 2 (v 2 ). 

(6.29) 

Similarly, we can show that for v 2 = 1 


^(v 1 ,l) = [/ 2 (v 1 )]- 1 , 

(6.30) 

and 


F<x(y 1) = [^(l+a)/2 2 ( V l)] 

(6.31) 
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Finally, if v t = 1 and v 2 -»■ go 

F„(l, oo) = «,/ 2 2 , 

and if v 2 = 1 and v x -► oo 

F a (0O, 1) = [M(i +a )/ 2 2 ] _1 » 

where is a point of the standard normal variate such that 

P[u > wj = a. 

The above results are summarized in Table (6.2). 


(6.32) 


(6.33) 


Table 6.2 Percentage points F a of the F(y x> v 2 ) distribution and their relation to 
the x 2 and Student t distributions 


v 2 


Vl 


Vl 


00 


1 * 2 «/2(l) — “ 27 


V 2 


*(l+a)/2 (1) 


^a/2 2 ( v 2) 


1 


u 


(1 + a)/2 ( V l) 

F a (Vi, V 2 ) 


1 


w (l+a)/2 

v 2 

Xi -« 2 (v 2 ) 


“«/2 


X« 2 ( y i) 

Vl 


00 


1 



7 Estimation I: Maximum Likelihood 

In previous chapters we have encountered the problem of estimating from 
a sample the values of the parameters of a population. For example, we have 
used the sample mean as defined by Eqn (5.1) as an estimate of the popula¬ 
tion mean, and we have seen that this choice is supported by the Laws of 
Large Numbers. However, in the case of the variance, we did not define the 
sample variance by analogy with the population variance (c/. Eqns (5.2) 
and (3.3)). It is clearly time to consider the whole question of estimation 
more carefully, and in this section we will discuss firstly the properties 
required of methods of estimation in general, and then consider estimation 
by the so-called Maximum Likelihood Method. The discussion will be 
extended in Chapters 8 and 9 to include estimation by several other methods. 

7.1 Properties of point estimators 

Firstly, it is necessary to consider rather more closely what we mean by 

estimation , and for this purpose it is useful to distinguish between the 
terms “estimator” and “estimate”. By the former we mean the method of 
estimation, and by the latter we mean the value to which it gives rise in a 
particular case. The estimator is a random variable (e.g. the sample mean) 
and gives rise to a population of estimates, so the merit of an estimator is 
to be judged by the quality of this population, and not by the value of a 
particular estimate. To be a suitable estimator a quantity must satisfy certain 
criteria and these are given below. 

It is intuitively obvious that a desirable property of an estimator is that, 
as the sample size increases, the estimate tends to the value of the population 
parameter. Any other result is clearly misleading. This property is known 
as consistency and is defined as follows. 

Definition 7.1. An estimator 0„, computed from a sample of size n, is 
said to be a consistent estimator of a population parameter B if, for any 
positive e and rj, arbitrarily small, there exists some N such that 

P[|0„ - B\ < e] > 1 - n. (7.1) 

In these circumstances 0„ is said to converge in probability to B. Thus 0 n is a 
consistent estimator of B if it converges in probability to B. 


72 



7.1 PROPERTIES OF POINT ESTIMATORS 73 

Example 7.1. Consider the Cauchy distribution, which we have defined 
in Section 4.4. 


The sample mean 


f(x; 6) = -i- 

7t 


1 

1 + (x - 0) 2 * 


1 ■ 

x = — £ *i’ 

« i-1 


has a distribution which is also of the Cauchy form, independent of sample 
size (cf Example 5.1). Thus the sample mean cannot converge in probability 
to 6 (or any constant), and hence is not a consistent estimator of 6. 

The property of consistency tells us the asymptotic ( n —> oo) behaviour of 
a suitable estimator. Having found such an estimator it is clear that we can 
generate an infinity of other estimators 

e; =/m> ( 7 - 2 ) 

provided 

lim f(n) = 1. (7-3) 

n-*ao 

However, we may further restrict the possible estimators by requiring that 
for all n the expected value of 0„ is B. Such an estimator is called unbiased. 

Definition 7.2. An estimator 6 n , computed from a sample of size n, is 
said to be an unbiased estimator of a population parameter B if 

E16J = B, (7.4) 


for all n. 


Example 7.2. By applying Eqn (7.4) to Eqn (5.1) we can trivially show 
that the sample mean x is an unbiased estimator of the population mean of 
Eqn (3.1). If we apply (7.4) to the definition of the sample variance of Eqn 
(5.2) we have 


——r Z (*i ~ x ) 2 
In - 1 fir-. i J 




= il2 - OV) 2 = * 2 - 


(7.5) 
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Thus the presence of the factor l/(« - 1) in the definition of the sample 
variance is to ensure that 


Els 2 ] = <T 2 , (7.6) 

i.e. that s 2 is an unbiased estimator of a 2 . We will always use Eqn (5.2) as 
the definition of the sample variance. However some authors prefer to work 
with the biased estimator 


s ' 2 = — £ (*» - x) 2 > 

n i=i 

and so care should be taken when consulting different sources. 

The requirements of consistency and lack of bias alone do not produce 
unique estimators. For example, one can easily show that the sample mean 
is a consistent and unbiased estimator of the mean of a normal population 
with known variance. But the same is true of the sample median. Thus we 
must impose further restrictions if uniqueness is required. One of these is 
known as the efficiency of an estimator. 

An unbiased estimator with a smaller variance will produce estimates 
more closely grouped round the population value B. An estimator with a 
smaller variance is said to be more efficient than one with a larger variance. 

Definition 7.3. If two consistent estimators and 0 2 , both calculated 
from a sample of size n, have var0 t < var d 2 then 0 X is said to be more 
efficient than 0 2 for samples of size n. 

Example 7.3. For the normal distribution we have, from Theorem 5.2, for 
any n 


var (mean) = a 2 In. 

But for large n 

. no 2 a 2 

var (median) =-> —. 

2 n n 

Thus the mean is the more efficient estimator, at least for large n. (In fact 
this result is true for all ri). Consistent estimators whose sampling variance 
for large samples are less than that of any other such estimator are called 
most efficient. Such estimators serve to define a scale of efficiency. Thus if 
0 2 has variance v 2 and 0 2 , the most-efficient estimator, has variance v ,, then 
the efficiency of 0 2 is defined as 



(7.7) 
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It may still be that there exist several consistent estimators 9 for a popula¬ 
tion parameter 0. Can one choose a “best” estimator from amongst them? 
The criterion of efficiency alone is not enough, since it is possible that one 
estimator 9 n , which is biased, is consistently closer to B than an unbiased 
estimator 0 n '. In this case the quantity to consider is not the variance but the 
second moment of 0 n about B , which is 

EW n - 0 ) 2 ]. 

When 

S = E[6 J, 

then 

El(O u - 0) 2 ] = var (0„). 

Thus we shall define 9 n to be a best estimator of the parameter B if 
E[(0 m - B) 2 ] ^ E[(0 n ' - 0) 2 ], 
where 0 n ' is any other estimator of B . 

The properties of estimators considered up to now give a good idea of 
the desirable properties of estimators. However, there is a more general 
criterion which we will now consider. Consider the case of estimating a 
parameter B. Let 

/(0 l9 0 2 , ...,0 r ;0), 

be the joint density function of r independent estimators 9 { {i — 1,2,..., r). 
Then, from the definition of the multivariate conditional density Eqn (3.30), 
we have 


f(0 u 0 2 , e r ; B) =/ M (0!; B)f c (0 2 , 0 3 , 0 r ; 0|0,), (7.8) 

where f M (9 l ; B ) is the marginal density of 6 t and / c (0 2 > •••» B r ; 0 |0j) is the 
conditional density of all the other 0 t , given 9 t . Now if f c is independent 
of B then clearly once 9 t is specified the other estimators contribute nothing 
further to the problem of estimating 0, i.e. 9 l contains all the information 
about B. In these circumstances 9 t is called a sufficient statistic for B. It is 
more convenient in practice to write (7.8) as a condition on the likelihood 
function. 

Definition 7.4. Let f(x;B) denote the density function of a random 
variable x 9 where the form of / is known but not the value of 0, which is to be 
estimated. 
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Let x u x 2 , ..., x n be a random sample of size n. The joint density function 
f(x t , x 2 ,x„;B) of the independent random variables x t , x 2 , ..., x n is 
given by 


f(x { , x 2 ,..., x n ;B) = [] f(*i ; Q), ( 7 . 9 ) 

1=1 

where f(x t ; B) is the density function for the zth random variable. f(x u x 2 , 
x n ) is called the likelihood function of 8 and is written 

L»{x ij ^2, ...,x w ,0) = Wf{pCi\B). (7.10) 

i= 1 

If L is expressible in the form 

L(xj, x 2 , •••} x n i 0 ) = L%(6, B^L 2 (xi, x 2 , ..., x n ), (7.11) 

where L x does not contain the x’s other than in the form 9 , and L 2 is inde¬ 
pendent of B , then 9 is a sufficient statistic for the estimation of B. 

Example 7.4. We will find a sufficient statistic for estimating the value of 
the variance of a normal distribution with zero mean, i.e. 

/W "V(bf“ p ['5?]' 

From Eqn (7.9), the likelihood function is 

i( *‘- .^ - (-^H )" exp ( - 2?-#, *4 

If we let L 2 = 1 in Eqn (7.11) then we have L x = L, and L x is a function 
of the sample only in terms of Thus, by Definition 7.4, Xxf is a 
sufficient estimator for a 2 . 


7.2 Maximum likelihood method 

Of all the possible methods of parameter estimation that of maximum 
likelihood is, in a sense to be discussed below, the most general, and is 
widely used in practice. We have briefly mentioned it in Chapter 2, but in 
the present chapter we will consider the method in more detail. 

7.2.1. Estimation of a Single Parameter 

The likelihood function has been defined in Eqn (7.10). If we suppress 
the dependence of L on x i9 then for a sample of size n 

L(9) = f\ f( Xi ; 0), 

i= 1 


(7.12) 
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where f(x; 6) is the frequency function of the parent population. The maxi¬ 
mum likelihood estimator is defined as follows. 


Definition 7.5. The maximum likelihood estimator (M.L.E.) of a popu¬ 
lation parameter 9 is that statistic B (we shall always use the symbol a 
to denote an estimate of a parameter) which maximizes L(0 ) for variations 
of 0 , i.e. the solution (if it exists) of the equations 


aL(0) _ a 2 L(e> 
ee ’ ee 2 


(7.13) 


Since L(6 ) > 0 the first equation is equivalent to 


1 dUS) 
L dd 




(7.14) 


which is the form more often used in practice. It is clear from (7.13) that 
the solution obtained by estimating the parameter 6 is the same as that 
obtained by estimating a function of 0 , e.g. F(0) since 

3InL dinL(F) dF 

d0 ~ dF ' d0’ ( '■ 

and the two sides of the equation vanish together. 

The importance of M.L.E. stems from the following four theorems, 
which we state without proof. 


Theorem 7.1. Maximum likelihood estimators are consistent. 


Theorem 7.2. Maximum likelihood estimators have a distribution which 
tends to normality for large samples . 

Theorem 7.3. Maximum likelihood estimators have minimum variance in 
the limit of large samples. 

Theorem 7.4. If a sufficient estimator for a parameter exists then it is a 
function of the maximum likelihood estimator. 

There are situations in which the above theorems do not hold and the 
M.L.E. gives a poor estimate for a parameter, but for the common distribu¬ 
tions met in practice they are valid. Proofs of these theorems, together with 
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exact statements on the range of their validity, may be found in the books 
of Kendall and Stuart, and Cramer, cited in the Bibliography. 

The likelihood function L(0) may be formally regarded as a probability 
density function for the parameter 0 viewed as a random variable. Thus we 
may define the variance of the estimator as 

var@ = | ” (0 - d) 2 L(0)de / f °° L(0)d6, (7.16) 

J - oo I J -oo 

and, by analogy with the work of Section 5.3, an estimate from experimental 
data would be quoted as 

6 = d e ± Ad, 

where 


Ad = (var d e )*. 

From Theorem 7.2, it follows that, for large samples, the form of L(0) is 

= < 717 > 


where 


v = var0. 


Now 


and 


Thus 


In L(6) = - In[(2™)*] - ^ 

v 


d 2 In L(0) -1 

86 2 ~~ 


var d 


d 2 In L (0)1 
dG 2 



(7.18) 


(7.19) 


This is another commonly used form for the variance of an estimate. We 
shall illustrate the use of the maximum likelihood method for one parameter 
by two examples. 
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Example 7.5. We shall estimate the parameter jx in the normal population 
given by 

**** (7 - 20) 

where a is known and — oo < x < co, and also calculate its variance. 

From (7.12) we have 

1 n 

In L(n) = - n In [(27t<7 2 )*] - “ /*) 2 > 

and hence the M.L.E. of n is the solution of 


o, 

dfi a i = i 


i.e. 


/>-—£*<-*• ( 7 - 2 » 

n i = i 

Thus the sample mean x is the M.L.E. of the parameter fx. 

From (7.19) the variance of ft is given by 

var ' ! = [- ^£,w (x, - l ‘ r ] 

= <r 2 /n, 

as expected from the result of Theorem 5.2. 

Example 7.6. As a more practical example we shall consider estimation 
of the parameter jx in the same normal population, but now for a set of 
experimental observations of the same quantity made with associated 
errors Ax*. The density function is 

(7 - 22) 

from which 



1„ L(„) - - In [(2,)* t A,,] - i t 
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dlnL ( n ) _ " rx ; - fil 

dfi i=i L (Ax f ) 2 J 

Setting (7.23) to zero gives 

fi= £ (XilW)l £ ( 1 /AXj 2 ), 

*—i / t =i 


(7.23) 


(7.24) 


a result which is often called the weighted mean of a set of observations. 
The variance of (t may be found from (7.23) and (7.19) directly. It is 


var fi = (A/1) 2 = £ (^~) 2 ^ (7.25) 


If an experiment has “good statistics” then the likelihood function will 
indeed be a close approximation to a normal distribution. However, many 
effects may be present which produce a function which is clearly not normal. 
In this case the use of Eqn (7.19) usually produces an underestimate for 
A0. A more realistic estimate of Ad is to average d 2 ln L(0)jd0 2 over the 
likelihood function, i.e. take 



V ) 


or alternatively, one could plot L(0), and then find the two values of 0 
where L(6) had fallen by a factor e* of its maximum, i.e. the two values 
which would correspond to ix ± a for a normal distribution of mean fi and 
variance a 2 . 

Another useful formula for A0 may be derived for situations where one 
wants to answer the question; how many data are required to establish a 
particular result to within a specified accuracy? The problem is to find a 
value for d 2 In L(0)/d9 2 averaged over many repeated experiments consisting 
of n events each. Since 


n 


In L(x; 6 ) = £ ln/(x j; 0), 

;=i 
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we have 




d 2 ln 

ee 2 


fix; 6)dx 


= nE 


d 2 ln f(x;d)l 
80 2 


(7.27) 


This form may be used in (7.19) directly, or it may be expressed in terms 
of first derivatives by 


l 


e 2 


ln/(x;fl) 

de 2 


fix; 6)d6 



dfjx;8) 

86 


dx 

fix; 0) ‘ 


Using (7.27) and (7.28) in (7.19) gives 


A6 = 


1 rr/d/(x;0)\ 2 

«* [J l S6 ) 


1 

fix; d) 



(7.28) 


(7.29) 


Example 7.7. Consider a distribution with density function 

fix-, 6) = HI + 6x), - 1 < x < 1. (7.30) 


We have 


dfix\6) _ x 
86 ~ 2 ’ 


and 


j:,m 2 


i 

fix; 6) 




Thus from (7.29) 


(7.31) 




26 


1/2 



(7.32) 


Suppose we wish to establish how many events would be required to obtain 
(A0/0) = 0 01 for 6 = 0-5. Substituting these numbers into (7.32) gives 
directly n ^ 11 x 10 5 events. In particular, Eqn (7.29) shows that to 
increase the precision of the experiment «-fold requires n 2 times as many 
events. 
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All the results of this subsection may be generalized in a straightforward 
way to the case of estimating a single parameter from a multivariate distri¬ 
bution. 


7.2.2. Simultaneous Estimation of Several Parameters 

If we wish to estimate simultaneously several parameters then the prece¬ 
ding results generalize in a straightforward way. Thus, the maximum likeli¬ 
hood equation now becomes the set of simultaneous equations 


dlnL(fl 1? 0 29 
80 ; 


= 0 , 1 = 1 , 2 ,...,/?. 


(7.33) 


Also the results of Theorems (7.1)-(7.4) hold. We shall state explicitly the 
generalization of Theorem 7.2. 


Theorem 7.5. The M.L.E. (? = 1, 2, ... 9 p) for the parameters of a density 
/(*; 0 l9 Op) from samples of size n are , for large samples , approximately 
distributed as the multivariate normal distribution with means 0 l9 ..., 6 p and 
with a variance matrix V where 




To illustrate the use of (7.33) and (7.34) we shall give the following example. 


Example 7.8. Consider the normal population 

/( * : "• - ii^w “ p [- 1 

where both n and a are to be estimated. From (7.33) we have 
d\nL(n, a) 1 ”, , x 

— H - 


(7.35) 


giving 


dlnL(/i, cr) 
do 


1 » 


(*/ - nf - 


n 

2^ 


= 0, 


fi = x ; * 2 =—£(*,• - x ) 2 - (7-36) 

n j= i 

Note that £ 2 is a biased estimator of a 2 [c/. Eqn (7.2)]. This is often the 
case with M.L.E.’s, but fortunately there usually exists a constant c, in 
this case n/(n — 1), such that c& 2 is unbiased. 
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From Theorem (7.5) the two estimators /x and 6 are approximately 
normally distributed with means /x and cr, and with a matrix M given by 
Eqn (7.34). Using (7.34) we have, with /x = 0 X and <j = 0 2 , 


M n 



n 


Mi 


= JVf 21 = - «£ j = 0, 

r 3(* - fi) 2 , 1 1 _ 2n 
L a 4 + a 2 J " a 2 ' 


M 22 = — nE 


Thus 


✓ n /cr 2 //i 0 \ 


and the variance and covariances are given by 

1 _ y 

- 'J - V' 


(7.37) 


(7.38) 


Finally, from (4.16) the form of the distribution of the estimators is 

sw , 9) = ^„ p {.»[ 2 (^)* + (£^ r] }. «,*> 

There is one point which should be remarked about the simultaneous 
estimation of several parameters, which we shall illustrate by reference to 
the above example. If we know n then estimation of a 2 alone gives 

6 2 = - t (*.' - A*) a , ( 7 - 40 > 

n 1 = 1 

which is not the same as Eqn (7.36) obtained from the simultaneous esti¬ 
mation of /x and a 2 . This is not surprising. However, from (7.36) we see 
that we can estimate /x, independent of any possible knowledge of a 2 , to be x. 
Thus, if we now find the estimator of a 2 maximizing the likelihood for all 
samples giving the estimated value of jx = x it might be thought that Eqn 
(7.36) will result, whereas in fact in this latter case 

&2 - r~r £ (*i ~ *) 2 - 

n — 1 i-i 


(7.41) 
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The difference between (7.36) and (7.41) is that in the former case we have 
considered the variations of In L(p, a) over all samples of size n , whereas 
in the latter the constraint, X(a:) = constant, has been applied, and thus 
lowered the number of degrees of freedom by one. For large n , of course, 
the difference is of no importance, but nevertheless it is a useful reminder 
that every parameter estimated from the sample (i.e. every constraint applied) 
lowers the number of degrees of freedom by one. 

The maximum likelihood method has the disadvantage that in order to 
estimate a parameter the form of the distribution must be known. Further¬ 
more, it often happens that L(0) is a highly non-linear function of the 
parameters 0, and so to maximize the likelihood may be a difficult problem. 
(In Appendix C we shall consider some methods which are useful for the 
numerical optimization of a function of several variables.) Finally, if the 
data under study are normally distributed then maximizing In L(6) is equiva¬ 
lent to minimizing 



which may be more useful in practice as we shall illustrate when we discuss 
the method of least-squares in Chapter 8. 

We will conclude this section with a few brief remarks on the interpreta¬ 
tion of maximum likelihood estimators. Bayes’ Theorem tells us that maxi¬ 
mizing the likelihood does not necessarily maximize the a posteriori proba¬ 
bility of an event. This is only the case if the a priori probabilities are equal 
or somehow “smooth”. Thus, maximum likelihood estimators (and, of 
course, other estimators) should always be interpreted in the light of prior 
knowledge. In Chapter 9 we shall show how such knowledge can formally 
be included in the estimation procedure. However, because it is difficult, in 
general, to reduce prior knowledge to the required form, the actual method 
of estimation is of little practical use. An alternative method is to form the 
product of the respective likelihood functions. This procedure is equivalent 
to forming a likelihood function for an experiment which includes all previous 
experiments. 



8 Estimation II: Least-Squares Method 


The method of least-squares is an application of minimum variance 
estimators (which we will meet again in Chapter 9) to the multivariate prob¬ 
lem, and is widely used in situations where a functional form is known (or 
assumed) to exist between the observed quantities and the parameters to be 
estimated. The functional form may be dictated by the requirements of a 
theoretical model of the data, or may be chosen arbitrarily to provide a 
convenient interpolation formula for use in other calculations. We will 
firstly consider the technique for the situation where it is most useful, that 
where the data depend linearly on the parameters to be estimated. In this 
form the least-squares method is frequently used in curve-fitting problems. 


8.1 Linear least-squares 

Initially we shall formulate the method as a procedure for finding esti¬ 
mators 0,(i = 1, ...,p) of parameters 0 t (i = 1, ...,/>) which minimize the 
function 


s = £ (y, - fh ) 2 = i r, 2 , 

i= 1 i =1 


( 8 . 1 ) 


where 


tfi = f(Xli> *2t’ •••> x kil @i> •••> @p)> i — (8.2) 

and y,, x„, x 2i , ..., x ki denote the /th set of observations on (k + 1) variables, 
of which only r,- is random. The relation 

fj = /(xi, Xj, •••, X k \ @ 1 , @2> •••> @p)’ (8-3) 

is called the equation of the regression curve of best fit. 

We shall consider the general case where the observations are correlated 
and have different “weights”. Suppose we make observations of a quantity y 
which is a function/(x; 6 U ...,9 p ) of one variable x and p parameters 9 t (i = 
1, 2, ..., p). (Note that x is not a random variable and/is not a density func- 
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tion). The observations y ( are made at points x t and are subject to experi¬ 
mental errors e f . If the n observations y\ depend linearly on the p parameters 
then the observational equations may be written 

T; = t 8 k <t>k(Xi) + e„ i = 1, 2,..., n (8.4) 

k- 1 


where <j> k (x) are any linearly independent functions of x. In matrix notation 
Eqn (8.4) may be written 


Y = ©0 + E, (8.5) 

where Y and E are (n x 1) column vectors, ® is a (p x 1) column vector, 
and 0> is the (« x p) matrix 


'4> i(*i) 

^ 2(^1) ••• 

... ^(Xi)' 

<l> i(* 2 ) 

<t> 2 (^ 2 ) ••• 

... <t> p (x 2 ) 

AW 

<t> 2W 

■■■ 4> P (x„)j 


The matrix <D is often known as the design matrix. 


8.1.1. Solution for the Parameters 

The problem is to obtain estimates 0 k for the parameters. For n = p a 
unique solution is obtainable directly from Eqn (8.5) by a simple matrix 
inversion, but for the more practical case n > p the system of equations is 
overdetermined. In this situation no general unique solution exists, and so 
what we seek is a “best average solution” in some sense. Thus we seek to 
approximate the experimental points y t by a series of degree p, i.e. 

fi = /(*,; e u e P ) = t W (8.6) 

k = 1 

Since the experimental errors are assumed to be random we would expect 
them to have a joint distribution with zero mean, i.e. 


£[Y] EE Y° = <P0, 


(8.7) 
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and an associated variance matrix 


where 

and 


<* 1 2 

*12 ••• 

... 

*12 

* 2 2 ... 

••• *2 n 

* 1 „ 

* 2 n ••• 

... o 2 


< 7 , 2 = £[e ; 2 ] = var ( y,), 
Oij = £0^] = cov (y„ yj). 


( 8 . 8 ) 


Note that we have only assumed that the population distribution of the 
errors has a finite second moment . In particular, it is not necessary to assume 
that the distribution is normal. However, if the errors are normally distri¬ 
buted, as is often the case in practice, then the method of least-squares 
gives the same results as the method of maximum likelihood. 

The quantities r f of Eqn (8.1) (called the residuals) are now replaced by 


p 

r t = y t - fi = y t - X 0 k<l>k(xi), 

k= 1 

and we will minimize 


S=t t r^j-K 

i= 1 J= 1 

= R r V~ 1 R, 


(8.9) 


( 8 . 10 ) 


where R is an (n x 1) column vector of residuals. 

It is convenient at this stage to assume that the variance matrix can be 
expressed in the form 

V=<7 2 W"\ (8.11) 

where o 2 is a scale factor and W is the so-called weight matrix of the obser¬ 
vations. In that case Eqn (8.10) becomes 

S = (Y - ®©) T W(Y - ®0)^2. 

To minimize S with respect to 0 we have 


( 8 . 12 ) 
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giving a solution 

0 = (® T WO)“ 1 ® T WY, 

(8.13) 

or, in non-matrix notation, 



= £ (E % i £ ^(.xdWijy., 

1 = 1 i=lj=l 

(8.14) 

where 

n n 



E k ,= z z MxtWMxj). 

i=l j= 1 

(8.15) 


These are the so-called normal equations for the parameters. One point 
worth remarking about the normal equations is that they do not require a 
knowledge of cr 2 , only the relative weight matrix W is required to estimate 
the parameters. 

The estimates 6 k of Eqn (8.14) have been obtained by minimizing the sum 
of the residuals, and although this has an intuitive geometrical appeal it 
still might be thought to be a rather arbitrary procedure. However, the 
importance of least-squares estimates stems from their “minimum-variance” 
properties which are summarized by the following theorem. 

Theorem 8.1. The least-squares estimate 6 k of the parameters 0 k is that 
estimate which minimizes the variance of any linear combination of the para¬ 
meters. 

Proof Consider the general linear sum of parameters 

L = C r 0, (8.16) 

where C is a (p x 1) vector of known constant coefficients. Let G be any 
(« x 1) vector such that 


C T = G r O. (8.17) 

The problem of minimizing the variance of L is now equivalent to minimizing 
the variance of G r Y subject to the constraint (8.17). Now since G is a 
constant vector 


var (G r Y) = G r (var Y)G = G r VG. 
Thus we can construct a variational function 


F = G r VG - A T (O r G - C), 


(8.18) 
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where A is a (p x 1) vector of Lagrange multipliers. Setting 8F = 0 gives 


G r = A r ® T V -1 , 

(8.19) 

and so 


A r = G T ®(® r V _1 ®) _1 . 

(8.20) 

Eliminating A T between (8.19) and (8.20) gives 


G t = G r ®(® T V -1 ®) _1 ® r V _1 . 

(8.21) 


If we now multiply (8.21) on the right by Y and use Eqn (8.13) we have 

G r Y = (G r ®)0 = C T 0. (8.22) 

Thus we have shown that the value of & which minimizes the variance of 
any linear combination of the parameters is the least-squares estimate 0. 
This result is originally due to Gauss. 


8.1.2. Errors on the Parameter Estimates 

Having obtained the least-squares estimates 0 k we have now to consider 
their variances and covariances. Returning to the solution of the normal 
equations, we have 

^ = (O t W®)-‘® t WY. (8.23) 

Now, we have previously used the result, that for any linear combination of 
jj, say P ; Y, with P a constant vector 

var (P T Y) = P r var (Y)P, (8.24) 


which can easily be proved from the definition of the variance matrix. Thus, 
applying Eqn (8.24) to © as given by Eqn (8.23) we have 


Using 


we have 


var (0) = (® r W®) _1 ® T W var (Y)W®(O r WO) -1 . 
var (Y) = V = <7 2 W _1 , 


var(6) = ^(d^W®) -1 . 


(8.25) 


This is the variance matrix of the parameters. Unlike the estimation problem 
itself (cf. Eqn (8.13)), to calculate the variance matrix for 0 requires a know- 
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ledge of <x 2 , the scale factor in the variance matrix of the observations. 
Fortunately, the least-squares method allows us to calculate an unbiased 
estimate of a 2 which may be used in (8.25). 

To estimate a 2 we return to Eqn (8.10), and consider the expected value 
of the weighted sum of residuals S. 

<7 2 F[S] = £[R r WR], (8.26) 

When 0 = 0 the r.h.s. of (8.26) becomes 

£[R r W(Y - ®0)] = £[R r WY], 

since 


R r W®@ = 0, 

is equivalent to a statement of the normal equations. Furthermore, 

R r WY = (Y r - 0 T O r )WY = (Y r WY) - (0 r N0), (8.27) 

where 


N = ® r W<&. 

By using the normal equations once again Eqn (8.27) may be reduced to 
(Y - Y°) r W(Y - Y°) - (0 - 0) r N(0 - ©), 
and thus we have arrived at the result that 
£[S] = £[R r V“ 1 R] 

= £[(Y - Y°) r V -1 (Y - Y°) - (0 - 0) r M -1 (© - 0)], (8.28) 

where 


M = <r 2 N _1 , 

which, by Eqn (8.25), is the variance matrix of the parameters. 

Consider the first term in (8.28). The quantity (Y - Y°) is a vector of 
random variables distributed with mean zero, and with variance matrix V. 
Thus 

£[Y - Y°) r V _1 (Y - Y 0 )] 

= £[Tr {(Y - Y°) r V _1 (Y - Y 0 )}] 

= Tr {£[(Y - Y°)(Y - Y°) r V -1 ]} 

= Tr (VV _1 ) = n. 
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Similarly, since M is the variance matrix of <=>, 

£[(0 - 0) r M -1 (@ - 0)] = p. 
Thus, from (8.28) we have 


£[R r V _1 R] = n — p, 
and so an unbiased estimate for a 2 is 

R r WR 


(8.29) 


n — p 

and consequently an unbiased estimate for the variance matrix for 0 is 

R t WR 


E = 


n-p 


(O r WO) _1 


--(Y - <D0) r W(Y - ®0)(O r WO) K (8.30) 

n — p 

This matrix is also known loosely as the error matrix, and it is common 
practice to quote the error on the parameter &, as 

- (£„)*. 

To find the error of the fitted value f we can simply use Eqn (8.6), for/, 
together with Eqns (8.24) and (8.25). The result, which is true for all values 
of x 9 is 

(A/) 2 = var f{x) = X I ^k(.x)E u <t>,(x), (8.31) 

k=l1=1 

which could also have been obtained from Eqn (5.44). The least-squares 
results may be used in a simple way to combine the results of several experi¬ 
ments, thereby generalizing Eqns (7.24) and (7.25). The following example 
will illustrate this. 

Example 8.1. An experiment measures two parameters 8 , to be 0, (1> = 10 
and 0 2 <l) = — 10 with a variance matrix 


20 _ 10 Vo- 2 . 

\ —1-0 1-5/ 


A second experiment finds a new value of 0 2 to be 0 2 (2) = ~ M with variance 
10“ 2 . We wish to combine the results of the two experiments. 
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The variance matrix for the two experiments is 

/ 2-0 - 1-0 0 \ 
V = I -10 


0 


1-5 0 10“ 2 . 

0 1 - 0 / 


Also, in the notation used previously 

•'-f 1 0 °y 

\o i i r 


and 



Thus we have 


and 


(3 2 0\ 

W = V-‘=i 2 4 0 10 2 , 
\0 0 4/ 


(®r wq> >-i = 

Thus, from (8.13) we have 


8 -2 
2 3 


10 ' 


i.e. 


0 = ( 104 V 

\ —106/ 

d t = 1-04 and 0 2 = - 1 06. 


To calculate the associated error matrix we use (8.30). From (8.29) 


and hence 


<t 2 = 0-40, 



—016\ 
0-24/ 


10 -2 . 


It is sometimes useful to know which linear combinations of parameter 
estimates have zero covariances. Since E is a real, symmetric matrix it can 
be diagonalized by a unitary matrix U. This same matrix then transforms 
the parameter estimates into the required linear combination. 



8.1 


LINEAR LEAST-SQUARES 


93 


We will conclude this discussion of the simple linear least-squares method 
with some general remarks. Firstly, in the above discussion we have not 
specified the functions cj> k (x) except that they form a linearly independent 
set. If we use a simple power series for <j> k {x) then the matrix (® r W<I>) is 
ill-conditioned for even quite moderate values of k, and the degree of ill- 
conditioning increases as n becomes larger. Thus serious rounding errors can 
occur if 0 is calculated from Eqn (8.13). If a power series, or similar form, 
is dictated by the requirements of a particular model, the parameters of which 
one requires to estimate, then one can only hope to circumvent the problem 
by a judicious choice of method to invert the matrix. Such techniques can 
be found in books on numerical analysis. However, if all that is required is 
any form which gives an adequate fit to the data then it would clearly be 
advantageous to choose functions such that the matrix to be inverted is 
diagonal. Such functions are called orthogonal polynomials and their con¬ 
struction is discussed in Appendix B. 

The second remark concerns the quality of the fit achieved by the least- 
squares method. For this we will have to assume a distribution for the y h 
and we will take this to be normal about/-. In this case the weighted sum 
of residuals S, of Eqn (8.10), is distributed as y 2 with n - p degrees of 
freedom. Thus, for a fit of given order p, one can calculate the probability 
P p that the expected value S e is smaller than the observed value S 0 . The order 
of the fit is then increased until this probability reaches any desired level. 
To increase p below the point where ~ in - p) would result in apparently 
better fits to the data. However, to do so would ignore the fact that the y t 
are random variables, and as such contain only a limited amount of infor¬ 
mation. 

Another test which is used to supplement the / 2 -test is based on the F 
distribution of Section 6.3. This procedure is designed to test the significance 
of adding an additional term in the expansion (8.6), i.e. to answer the ques¬ 
tion: is 6 k different from zero? If S p and S p . k denote the values of S for 
fits of order p and p — 1 , respectively, then, from the additive property of 
y 2 , the quantity (S p _ t - S p ) obeys a y 2 distribution with one degree of 
freedom, and which is distributed independently of S p itself. Thus, the 
statistic 


F = ~ 

S p l(n - p) ’ 

obeys an F distribution with 1 and (n - p) degrees of freedom. From tables 
of the F distribution we can now find the probability P that the observed 
value F 0 is greater than the expected Thus if P p corresponds to F 0 (n—p ) 
then we may assume 9 P = 0 with a probability P p of being correct. It is 
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still possible that even though Q p = 0 higher terms are non-zero, but in this 
case the * 2 -test would indicate that a satisfactory fit had not yet been achieved. 
We will discuss these points in more detail in Chapter 11, Section 11.4, 
after we have considered the theory of hypothesis testing in general. 


8.2 Linear least-squares with constraints 

It sometimes happens in practice that one has information about some 
of the parameters to be estimated. We will generalize the discussion of 
Section 8.1 by considering the situation where the additional information 
takes the form of a set of linear constraint equations of the form 

= Z|, 

or, in matrix notation 

C0 = Z, (8.32) 

where the rank of C is /. This problem can be solved if we introduce the 
(/ x 1) vector of Lagrange multipliers A. Then the variation function that 
we have to consider is 

L = (R r V -1 R) - 2A r (C© - Z), 
and setting 5L = 0 gives 

SL=0 = 2[-Y t V 1 ® + ©/(O’Y" 1 ®) - A r C]<5®, 
i.e. 


A r C = 0 c r (O r V" j O) - Y r V- 1 d>, (8.33) 

where O r is the vector of estimates under the constraints. 

Now we have seen previously that 

(Y t V - 1 ®) = <^> T (<D r V -1 d>), (8.34) 

where © is the estimate when the constraints are removed, and using this 
relation in Eqn (8.33) gives 

A r C = (® c — ®) r (0 r V -1 O). (8.35) 

If we set 


M = 0- 2 (O r V -1 O) = (O t WO), 
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then 

<r 2 A T CM' 1 C T = (@ c - ©) T C T 

= z r - & T C T , 

from which we obtain our result A T , 

<7 2 A r = (Z r - @ T C r )(CM _ 1 C T )“ 1 . (8.36) 

Substituting (8.36) into (8.35) and solving for 0 C gives 

0/ = @ r + (Z T - @ T C r )(CM" 1 C T )“ 1 CM“ 1 . (8.37) 

This is the solution for the least-squares estimate of 0 under the constraints. 
To find the variance matrix for the estimates 0 C we have, from (8.37) 

var(@ c ) = <t 2 [M _1 - M -1 C T (CM _1 C T ) _1 CM _1 ], (8.38) 

which, using the definition of M, may be written in terms of the weight 
matrix W. 

We are now left with the problem of finding an estimate for the scale 
parameter a 2 . This may be done in a similar way to the unconstrained pro¬ 
blem. Thus we consider the expected value of the weighted sum of the 
residues under the constraints. This is 

E[S] = £[(R t V -1 R) + (0 C - ©)WV- ] <D)(0 C - 0)], (8.39) 

where R is the matrix of residuals without constraints as defined previously 
in Eqn (8.10). Using the technique previously used we can show that the 
second term has an expected value of /, where / is the rank of the constraint 
matrix, C, and we have already shown that the expected value of the first 
term is (n - p ). So an unbiased estimate of a 2 is 

= (R t WR) + (0 C - 0) r (q> r WO)(0 c - 0) _ (8 40) 

n - p + l 

The second term may be written in a form which is independent of 0 f by 
using (8.37) for 

(6 C - 6). 


This gives 

_ (R r WR) + (Z - C^CM-^^-HZ - C@) 
n - p + l 


(8.41) 
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Finally, the error matrix for the parameters 0 C is given by 

E c = £ 2 [M _1 - M -1 C T (CM -1 C T ) -1 CM -1 ], (8.42) 

where & 2 is given by (8.41). 

Example 8.2. Consider the estimation problem of Example 8.1, but now 
with the constraint 


0 t + 6 2 = 0. 


With this constraint we have 


C T 



and Z = 0, 


and we have already calculated M 1 in Example 8.1. It is 


Thus, using 




1(T 2 . 



as obtained in Example 8.1, direct calculation from (8.37) gives 


®r = 



Also, using (8.41) and (8.42), we find the associated error matrix to be 


K 


/ 0122 
\ - 0-122 


— 0122 \ 
0122 / 


10 “ 2 , 


which is singular, as expected. 


8.3 Non-linear least-squares 

If the fitting functions are not linear in the parameters then the weighted 
sum of residuals that we have to minimize is 

S = ^ [Y — F(0)] r W[Y - F(0)]. 


(8.43) 
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Differentiating S with respect to 0 and setting the result to zero leads to 
a set of non-linear simultaneous equations, and consequently presents a 
difficult, problem to be solved. In practice S is minimized directly by an 
iterative procedure, starting from some initial estimates for 0, which may 
be suggested by the theoretical model, or in extreme situations may be little 
more than educated guesses. We will illustrate how such a scheme might 
in principle be applied, but will defer to later a serious discussion of practical 
methods. 

Let the initial estimate of 0 be 0 O . Then, if 0 O is close enough to the 
“true” values 0 we may expand the quantity 

Y - F(0) 

in a Taylor series about 0 O and keep only the first term. Thus, 

A 0 = Y-F(© 0 )*-?^8 0 , (8.44) 


where 8 0 is a vector of small increments of ©. The problem of calculating 
8 0 is now reduced to one of linear least-squares, since both A 0 , and the 
design matrix 




dF(® 0 ) 

d© 


are obtainable. Given a solution for 8 0 from the normal equation, a new 
approximation 


F(© x ) = F(© 0 + 8 0 ), 


may be calculated. This in turn will lead to a new design matrix 


*i s 


8F(©,) 
8 & ’ 


and a new vector A 1( and hence, via the normal equations, to a new incre¬ 
mental vector 8 P This linearization procedure may now be iterated until 
the changes in 0 from one iteration to the next one are very small. At the 
close of the iterations the variance matrix for the parameters is again taken 
to be the inverse of the matrix of the normal equations. 

As we have emphasised, however, the above procedure is only to illustrate 
a possible method of finding the minimum of S. In practice several difficul¬ 
ties could occur, e.g. the initial estimates © 0 could be such as to invalidate 
the truncation of the Taylor series at its first term. In general such a method 
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is not sure of converging to any value, let alone to values representing a 
true minimum of S. 

The problem of minimizing S is an example of a more general class of 
problems which come under the heading of “optimization of a function of 
several variables” and is an active field of research at present. In Appendix C 
we consider some of the current methods which have proved successful in 
practice. 



9 Estimation III: Other Methods 


Estimation by the method of maximum likelihood, as described in Chapter 
7, is a very general technique. However, several other methods are also in 
common use, and may be more suitable for certain applications. We shall 
briefly describe some of them below. 


9.1 Minimum chi-square 

Consider the case in which all the values of a population fall into k mutually 
exclusive categories c t (i = 1, 2, ..., k). Let denote the proportion of values 
falling into category c t where 


X>i = I- (9-1) 

i = 1 

Furthermore, in a random sample of n observations, let o t and e { = np { 
denote the observed and expected frequency in category c t where 

- IL (9.2) 

i= 1 i= 1 

Now in Section 4.6 we considered the multinomial distribution with density 
function 

f(r u r 2 ./*_ t ) = —-n P'‘> ( 9 - 3 ) 

n r ii i=l 

i = 1 

where r t denotes the frequency of observations in the ith category in which 
the true proportion of observations is p t (i = 1, ..., k ). We recall that the 
multinomial density function gives exact probabilities for any set of observed 
frequencies 

r i = o t ; r 2 = o 2 ; ...;r k = o k . (9.4) 

Now each r t is distributed binomially and we have seen in previous sections 
that the binomial distribution tends rapidly to a Poisson distribution with 
both mean and variance equal to np t . The Poisson distribution in turn tends 


99 



100 


ESTIMATION III: OTHER METHODS 


9.1 


to a normal distribution as np t increases. Conventionally the Poisson dis¬ 
tribution is considered approximatley normal if \i > 9. Thus if npi ^ 9, r f 
is approximately normally distributed with mean and variance both given 
by np t . If follows, that by converting to standard measure, the statistic 


Ui = 


n ~ npj 

0 npi )* ’ 


(9.5) 


is approximately normally distributed with mean zero and unit variance. 
Furthermore 


= Z «« 2 

i= 1 


y fa ~ ^f ) 2 = y (Oj ~ gf ) 2 

»=1 npi i=i e* 


(9.6) 


is distributed as / 2 with (& — 1) degrees of freedom. 

A more common situation that arises in practice is where the generating 
density function is not completely specified, but instead, contains a number 
of unknown parameters. If the observed frequencies are used to provide 
estimates of the p u then the quantity analogous to x 2 of Eqn (9.6) is 


x' 2 = y (°* n ^ 2 

i=i npi 


(9.7) 


There now arise two questions; (1) what is the best way of estimating the 
Pi and (2) what is the distribution of *' 2 ? There are clearly many different 
methods available to estimate the p t but one which is widely used is to 
choose values which minimize x' 2 . This may in general be a difficult problem, 
and is another example of the general class of optimization problems men¬ 
tioned at the end of Chapter 8, and which are discussed more fully in Appen¬ 
dix C. It can be shown that for a wide class of methods of estimating the 
Pi > including that of minimum chi-square, x' 2 is asymptotically distributed 
as x 2 with (A: — 1 — c) degrees of freedom where c is the number of indepen¬ 
dent parameters of the distribution used to estimate the p t . 

In general, if x f is a sample of size n from a multinomial population with 
mean p(8) and variance matrix V(0), where 0 is to be estimated, then the 
value 0(x 1 , ..., x„) which minimizes 

X 2 - 2-£ - Ke)) T [v( 0 )]" 1 (x - n(9». 


i.e. the minimum — / 2 estimate of 0, is known to be consistent, asymptoti¬ 
cally efficient, and asymptotically normal distributed if x is distributed 
like the binomial, Poisson or normal distribution (and many others). 
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Suppose we have drawn samples from n populations each with the same 
mean ji but with different variances. Let the sample means be x t and the 
corresponding variances be of = <7 2 (x f ). We will consider the problem of 
combining these samples to obtain an estimate of the population mean ji. 
Since x t is an unbiased estimate of ji, the quantity 


with 


Z apb 

i— 1 

(9.8) 

n 

II 

Q 

1 X 1 v 

(9.9) 


is also an unbiased estimate, regardless of the values of the coefficients a i9 
so the problem is one of selecting a suitable set of a f . One criterion that is 
employed is to choose the a t such that x has minimum variance. 

Now we have 

var (x) = var ^ £ a t x 


£ aj 2 var(Xj) = £ a 2 a 2 , 
i=l i=l 


(9.10) 


and we wish to minimize (9.10) subject to the constraint (9.9). Once again 
we will use the method of Lagrange multipliers. Thus, if we introduce a 
multiplier A, then the variation function is 


and 


Thus 


l = t «<v+ A ( .i —1)> 


= 0 = laid 2 + X. 

Cdi 


_A_ 
2 o{ 


(9.11) 


and so, from (9.9) 


A= -2 


(9.12) 
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Substituting (9.12) in (9.11) gives the solution for the coefficient a t . 


n 5 

e 1 1 °; 

; — i 


and so, from (9.8) 


E (*./<h 2 ) 

X = iil-. 

E o/o-j 2 ) 

i- 1 

The variance of x may be found from (9.10). It is 


(9.13) 


var (*)=!/ E (1/ff/). (9.14) 

The above discussion illustrates the use of the minimum variance criterion 
to obtain an estimate. In the example given it is interesting to note that 
the results obtained by the minimum variance method are identical to those 
obtained by the maximum likelihood method (cf. Example 7.6), where the 
population distribution was assumed to be normal. In general, for other 
population densities, the results of the two methods will differ, however. 


9.3 Bayes’ estimators 

In Section 2.3 we introduced Bayes’ Theorem and showed that to maximize 
the a posteriori probability required a knowledge of a priori probabilities. 
In general, these latter probabilities are not completely known. However, 
there occur cases where some partial information is available, and in these 
circumstances it is clearly advantageous to include it in the estimation 
procedure. 

We will consider the case where the prior information about the parameter 
is such that the parameter itself can be formally regarded as a random variable 
with an associated density p(6). The form for p(0) could be obtained, for 
example, by plotting all previous estimates of 0. This will very often be 
found to be an approximately Gaussian form, and from the results estimates 
of the mean and variance of the associated normal distribution could be 
made. In these cases where both the usual variable and the parameter may 
be regarded as random variables we will denote the corresponding density 
as f R ix;0). 



9.3 


BAYES’ ESTIMATORS 


103 


Before proceeding to a formal definition of Bayes’ estimators we shall 
need a few definitions of subsidiary quantities. 

Firstly, borrowing from the field of decision theory we shall consider the 
loss function 10; 6) which, expressed loosely, gives the “loss” incurred by 
using the estimate 9 instead of the true value 9. In practice it is difficult to 
know what form to assume for the loss function, but a simple, common- 
sense, form which suggests itself is 

l0;O) = 0-6) 2 . (9.15) 

(In fact a loss function which is bounded by zero, as in the example we have 
chosen, is an example of a more general function found in decision theory, 
called a risk function). The other quantities we shall need follow directly 
from work of previous chapters. Thus we shall call 


j(x i, x 2 , »••, x n , 9) ••• x n \9)p{9), 

the joint density of x u ..., x„, and 9, and 


m(x j, x 2 . 



x n , 6)d9, 


(9.16) 


(9.17) 


the marginal distribution of the x’s. From Eqn (3.30) it then follows that 
the conditional distribution of 6 given x u .... x n is 


c(0|x 1( x 2 . 


X n ) = 


■ /(*!, *2, —>X n ,0) 
m(x !, X 2 , x n ) 


f(x u X 2 , ..., X n \ff)p(0) 

m(xj, x 2 ,..., x„) 


(9.18) 


This is called the a posteriori density. We may now define a Bayes’ estimator. 


Definition 9.1. Let x u ..., x n be a random sample of size n drawn from 
a density f R (x;9). Let p(9) be the density of 9, and f(x u ..., x„|0) be the 
conditional density of the x’s given 9. Furthermore, let c(9\x t , ..., x r ) be the 
conditional density of 9 given the x’s, and let 10; 9) be the loss function. 
Then the Bayes ’ estimator of 9 is that function defined by 

9 = d{xy, x 2 , ..., *„), 
which minimizes the quantity 

B0; x u ..., x n ) = f 10; 9)c(9\x u ..., x n )d9. 

J — 00 

We will illustrate the use of this definition by an example. 


(9.19) 
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Example 9.1. Let x t (i = 1,2, ..., ri) be a random sample of size n drawn 
from a normal density with unit variance 

/*(*; A# = ex PC-i(* - 

From the definitions above 

/(* 1. •••» X n\p) = ^TT)"/ 2 CXP ~ 2 i U * X i + «^ 2 ]- 


Now if we assume that our information about \i takes the form of knowing 
that it is normally distributed, i.e. 

p (ji) = e xp(-M 2 / 2 ). - co < /i < co , 

then from (9.16) and (9.17) 


•••» *»> /0 = (^)(n+i )/2 ex P [-i(E*« 2 + (» + % 2 - 2/inx)], 


and 


m(x u ..., x„) 


exp(-iSx f 2 ) f« 


f exp[-i((n 
J — 00 


+ l)n 2 — 2finxJ]dfi 


(2nf n+1)l2 


(» + 


Thus, from (9.18), after some simplification, we have 


cMx ' .- ( jZ 5^ J -)* exp {" ^ - TTf] T <9 ' 20) 


If we further assume the following form for the loss function. 


l((l; n) = (fl- n) 2 . 


(9.21) 



9.4 


METHOD OF MOMENTS 


105 


then using (9.20) and (9.21) in (9.19) gives 

B((l; Xjj...»x n ) 


(A - p) exp - 


(« +D 




[" 


To minimize B we have 


giving 


|j(A;*i> ...,X n ) = 0, 


A = "Try Z *»• 

« + 1 i= l 


nx 

n +1 



(9.22) 


This is the Bayes’ estimator for p. 

Under very general conditions it can be shown that Bayes’ estimators, 
independent of the assumed prior distributions, are efficient, consistent, and 
a function of sufficient estimators. Furthermore, Bayes’ estimators tend to 
M.L. estimators for large samples. The disadvantage in using tlw method 
in practice is the necessity of assuming a form for both p(0) and 10; 9). 


9.4 Method of moments 

In Section 3.4 (Theorem 3.2) we saw that two distributions with a common 
m.g.f. were equal. This provides a method for estimating the parameters 
of a distribution by estimating the moments of the distribution. 

Let f(x; 9 t , ..., 6 p ) be a univariate density function with p parameters 
e t {i = 1, 2, ...,p), and let the first p moments about the origin be 



Let x n be a random sample of size n drawn from the density f. The first p 
sample moments are given by 

«/ = -t V- ( 9 - 24 ) 

1 n i=i 

The estimators 9 t of the parameters are obtained from the solutions of 
the p equations 

m/ = p/, 


j = 1, 2, ...,p. 


(9.25) 
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Example 9.2. Consider the normal distribution, for which we have pre¬ 
viously seen (Eqn (4.6» 


Vi = \l\ fl 2 ' = < 7 2 + fi 2 . 


The sample moments are 


1 £ 


m 


1 A 


'=-£ m 2 '=-X V- 


n i=i 


n i 


Applying (9.25) gives 


and 


i.e. 


&=—z *.•=*> 

« i=i 


d 2 + p 2 =—f * f 2 , 

n j=i 


<* 2 = 2- [ £ x i 2 - nxA = — Z(x t - x ) 2 
n Li=i J n 


(9.26) 

(9.27) 

(9.28) 

(9.29) 

(9.30) 


Thus, the estimators obtained by the method of moments are, for this 
example, the same as those obtained by the maximum likelihood method. 

In some applications where the population density function is not com¬ 
pletely known it may be advantageous to use particular linear combinations 
of moments. Consider, for example, a density function f(x; 0 U ..., B p ), which 
is unknown, but may be expanded in the form 


f(x;0 .. e p ) = £ BjPjix), (9.31) 

j=i 

where P/x) is a set of orthogonal polynomials normalized such that 

C (* = J 

I P i( x )P j(x)dx = [ (9.32) 

J [0, i+j. 

The population moments deduced from (9.31) are 

^ = f I 9jPj(x)x J dx. (9.33) 

J J—l 
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However, we may also consider the linear combination of moments given by 


n f = 



OjPj(x)P t (x)dx, 


(9.34) 


which, by (9.32), are given by 

Q f = 0 A- (9-35) 

The equivalent sample moments are 

ntt = — £ p i( x J )> ( 9 - 36 ) 

Jt j= 1 

and so, by equating the two, we have 

(9 - 37) 

n <Pi j= i 

This method is useful for finding the angular distribution coefficient 
in the expansion of, e.g. a differential cross-section, i.e. 


do 

dcos8 


= Y a jPj(c°s e\ 


where Pj is a Legendre function. In this case 



(9.38) 


(9.39) 


and since the data do not have to be grouped this method of estimation is 
particularly appropriate for small samples. 

Many examples exist however where the data is already grouped, e.g. in 
the form of a histogram, and in this case some error is introduced if the 
moments of the sample are calculated by assuming that the frequencies 
are concentrated at the mid-points of the intervals. In many cases it is 
possible to make corrections for this effect by a method due to Sheppard. 
If m! are the true moments, and m' the moments as calculated from the 
grouped data with interval width h 9 then 

mi = 

m 2 ' = m 2 ' - Tjh 1 

m 2 = fn 3 ' - i m^h 2 

m A ' = m A — i m 2 h 2 + 2 To h*. 
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and, in general 

m r = .Z {( T j )(2 W - l)B,.^m,_/J, 

where Bj is the Bernoulli number of order j. The circumstances under which 
it is valid to apply these corrections are discussed in detail in the book of 
Kendall and Stuart. 

The modifications necessary to the above simple account in order to make 
the method of moments practically useful are similar to those which have 
been discussed for the least-squares method, and so we will not discuss the 
method further. 

Under quite general conditions, it can be shown that estimators obtained 
by the method of moments are consistent, but not, in general, most efficient. 



10 Confidence Intervals and Regions 

In Chapters 7, 8 and 9 we have discussed what is usually referred to as 
point estimation , i.e. the estimation of the value of a parameter. In practice, 
however, point estimation alone is not enough. It is necessary to supply also 
some statement about the error on the estimate. This problem, known as 
interval estimation , will be examined in this chapter. 

10.1 Introduction 

Our aim is to find an interval about the estimator 0 such that we may 
make probabilistic statements concerning the probability of the true value 6 
being within the interval. A method which is applicable in many cases is 
the following. One finds, if possible, a function of the sample data and the 
parameter to be estimated, say w, which has a distribution independent of 
the parameter. Then a probability statement of the form 

P[u l < u < w 2 ] = P> 

is constructed and converted into a probability statement about the para¬ 
meter to be estimated. It is not always possible to find such a function, and 
in these cases more general methods (to be described in Section 10.3) must 
be used. For the present we will illustrate the above method by an example. 

Example 10.1. Consider the case of sample of size 100 drawn from a 
population with unit variance but unknown mean p. The quantity 

( 10 . 1 ) 

is, in general, normally distributed with mean zero and unit variance, and 
thus has a density function 


Ru) = - 2^ eXpC "" 2/2] ’ (10-2) 

which is independent of p. The probability that u lies between any two 
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arbitrary values «, and u 2 is thus 


10.1 


P[«! < u < « 2 ] = f (10.3) 

Jut 

For example, if u x = - u 2 = — 1-96 then, from tables of the normal distri¬ 
bution, 

/* 1*96 

P[- 1-96 < if < 1 96] = /(/)* = 0-95. (10.4) 

J- 1-96 

Using (10.1) this becomes 

P[/i - 0196 < fi < £ + 0-196] = 0-95. (10.5) 

Now suppose fi is estimated from the sample to be x = 10, then we have 
P[0-804 < [i < 1*196] = 0-95. (10.6) 

Equation (10.6) expresses the result that the probability that the random 
interval 0-804 to 1-196 contains the true mean jn is 0-95. Thus, if samples of 
size 100 were repeatedly drawn from the population, and if a random interval 
was computed for each sample from (10.5), then 95% of those intervals 
would be expected to contain the true mean. The interval (0-804 to M96 in 
this case) called a 95% confidence interval , and the probability, here 0-95, 
is called the confidence coefficient. 

The above example leads directly to the following definition. 

Definition 10.1. A random interval (J l9 J 2 ), depending only on the ob¬ 
served data, and having the property that 100(1 - 2a) % of such intervals 
computed will include the true value of the parameter being estimated, is 
called the 100(1 — 2a) % confidence interval and (1 - 2a) is called the confi¬ 
dence coefficient. (The reason for using the quantity (1 - 2a) will become 
clear in Chapter 11.) 

It is clear from Eqn (10.3) that there exist many pairs of numbers u 1 and 
u 2 such that P(u x < u ^ u 2 ) is a constant. The best confidence interval is 
clearly the shortest, and for symmetric distributions of the type (10.2) this 
condition is obtained by choosing u x and u 2 such that f(u x ) = f(u 2 ). In other 
cases the construction of confidence intervals which are shortest for a given 
confidence coefficient is difficult, or may not even be possible. 

Example 10.2. A practical problem which frequently arises is to put an 
upper limit on the observation of a rare event. For example, if out of 1000 
decays of a particle, 9 are observed to be of a type E what can one say 
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about the branching ratio for El The Poisson distribution is applicable here 
and we have \i — a 2 — 9. However, we also know that for ^ 5= 9 the Poisson 
distribution is well approximated by a normal distribution. Thus the quantity 



is a standard normal deviate. Thus, for example, from the tables in Appendix 
D, 

P[u < 1-645] = 0-95, 

and hence jc < 13-9 and B E < 1-4% with 95% confidence. Had less than 9 
events been observed then the normal approximation could not have been 
used. Thus, if no events had been seen, a branching ratio based on one 
assumed event can be found from tables of the cumulative Poisson distribu¬ 
tion to be B e < 0-3% with 92 % confidence. 

The concept of interval estimation for a single parameter may be extended 
in a straightforward way to include simultaneous estimation of several 
parameters. Thus, a 100(1 - 2a) % confidence region is a region constructed 
from the sample such that, for repeatedly drawn samples, 100(1 - 2a) % of 
the regions would be expected to contain the set of parameters under esti¬ 
mation. 

It should be remarked immediately that confidence intervals and regions 
are essentially arbitrary, because they depend on what function of the 
observations is chosen to be an estimator. This fact is easily illustrated by 
reference to the normal distribution of Example 10.1. If we use the sample 
mean as an estimator of the population mean, then for a confidence coeffi¬ 
cient of 0-95 

T 1-96<t l*96a] nn7 , 

— + • v 

and the length of the interval is 2 x 1-96 o/y/n. However, we could also 
use any given single observation to be an estimator, in which case the confi¬ 
dence interval would be («)* times as long. An important property of M.L.E.’s 
is that, for large samples, they provide confidence intervals and regions 
which are smaller, on average, than intervals and regions determined by any 
other method of estimation of the parameters. 

10.2 Normal distributions 

Because the normal distribution is of such importance we will consider 
obtaining confidence intervals for its parameters separately. 
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10.2 


From Eqn (10.7) it is clear that a confidence interval for fi cannot be 
calculated unless the variance is known. For large samples we could use an 
estimate a 2 for this quantity without real loss of precision, but for small 
samples this procedure is not satisfactory. The solution is to use the quantity 



which, from Theorem 6.5, has a t distribution with (n — 1) degrees of free¬ 
dom, and only involves n. Thus we can find a number t x such that 



fit-, n - 1 )dt = (1 - 2a). 


(10.9) 


As in Example 10.1, we may now transform the inequality in (10.9) to give 


Pl(x - T x ) < n < (x + TJ] = (1 - 2a), (10.10) 


where 



The width of the interval is then 2 T x . The number t x is called the 100a% 
level of t, and gives the point which cuts off 100a % of the area under the 
curve fit) on the upper tail. 

10.2.2. Confidence Intervals for the Variance 

To set up confidence intervals for the variance we use the y 2 distribution. 
Thus the quantity 


x 2 = i I (*.• - *) 2 > (10.11) 

G i = 1 

has, from Theorem 6.2, a# 2 distribution with (n — 1) degrees of freedom, and 
so we can use it to find numbers Xi 2 and X 2 2 such that 

p lx l 2 < x 2 < X 2 2 ] = f fix 2 1 n - \)dy 2 = 1-2a, 

Jx * 
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or, equivalently, 

p\ -L- £ (*, - S) ! < ^ < A-1 <*, - S) ! l - ‘ - 2«- < 1012 > 

Since the x 2 distribution is not symmetric, the shortest confidence interval 
cannot be simply obtained for a given a. However, provided the number of 
degrees of freedom is not too small, a good approximation is to choose 
Xl 2 and x 2 2 such that 100a % of the area of fix 2 ) is cut off from each tail, 
i.e. such that 

f /(X 2 ; n - 1 )d X 2 = 1 - a, 

V 

and 

f /(X 2 ; n - l)d X 2 = a. 

J ** 2 

Such numbers can easily be obtained from tables of the x 2 distribution 
function. 


10.2.3. Confidence Regions for the Mean and Variance 

In constructing a confidence region for the mean and variance simul¬ 
taneously we cannot use the region bounded by the limits of the confidence 
intervals obtained separately for /i and a 2 (a rectangle in the (fi, <r 2 ) plane), 
i.e. Eqns (10.10) and (10.12) above, because the quantities t of (10.8) and fi 
are not independently distributed, and hence the joint probability that the 
two intervals contain the true parameter values is not equal to the product 
of the separate probabilities. However, the distributions of x and £(x f — x) 
are independent and may be used to construct the required confidence region. 
Thus, for a 100(1 - 2a) % confidence region we may find numbers a t such 
that 


p\- a v « ( * /* -) < a 2 1 = (1 - 2a)* 

L \ ols/n 1 J (10.13) 

p [- a 3 < ( ^ (Xi ~ 2 — ) < = (1 “ 2 ^- 

The joint probability is then (1 - 2a) by virtue of the independence of the 
variables. The region defined by (10.13) will not, in general, be the smallest 
possible but will not differ much from the minimum (which is roughly 
elliptical) unless the sample size is very small. 
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10.3 General method 

The method used in Section 10.1 requires the existence of functions of 
the sample and parameters which are distributed independently of the 
parameters. This is its disadvantage, for in many cases such functions do 
not exist. However, for these cases there exists a more general method which 
we describe below. 

Let g(9; 9) be the density of 0 , the estimator for samples of size n of the 
parameter 6 in a population density f(x;9). For a fixed value of 6 a 
100(1 — 2a) % confidence interval for 0 is constructed from the expression 

f *2<*> 

Plh x (0) < 0 ^ h 2 (0y] = g(9; 9)$ = 1 - 2a. (10.14) 

Jhm 

Equation (10.14) may be rewritten as 

Pie < M0)] = 9(6; 0)dG = a, (10.15) 

J — 00 

and 


P[6 > h 2 m = f°° 9(0; 9)dd = a, (10.16) 

Jh 2 (8) 

which determine the functions h x (9) and h 2 (0). If the equations 9 = h x {9) 
and 9 — h 2 {9) are plotted the diagram shown in Fig. (10.1) results. A vertical 
line through a particular value of 9 , say 0, intersects h t (0) and h 2 (9 ) at the 
values 0! = h t (B) and 0 2 = A 2 (0) which are the 100(1 - 2a)% confidence 
limits. 

To construct a confidence interval for 9 we calculate an estimate from a 
sample of size «, say 9 n . A horizontal line through B n cuts the curves at values 
9 t and 9 2 which, by construction, define the confidence limits, i.e. 


P[9 X < 9 ^ 0 2 ] = 1 - 2a. 


(10.17) 


To find the curves h t (9) and h 2 (0 ) may be a lengthy procedure. However, 
in some cases the values 9 X and 0 2 may be obtained without knowing these 
curves. From (10.15) and (10.16), 9 X and 0 2 are solutions of the equations 



(10.18) 


so if these equations can be solved the confidence interval results directly. 
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The general method given above can be extended to the case of confidence 
regions for the n parameters of the population /(x; 0 1# 0 n ), i.e. that 

region R in the parameter space such that 

P[6 1, 0 2 , are contained in R] 

=£... J^i, 0 2 , .... K -, 0i, 02.... 0») n a 

= 1 - 2a, (10.19) 

but the resulting formulas are very complicated, and will not be considered 
here. 



Fio. 10.1. General method for finding confidence intervals. 

Finally, we note that the method cannot be used to obtain confidence 
regions for a subset r of the n parameters in the density/(x; 0,, ..., 0„), except 
for the case of large samples which is discussed in Section 10.4. In fact this 
appears to be an unsolved problem for samples of arbitrary size. 

10.4 Case of large samples 

In Chapter 7 we have seen in Theorem 7.2 that the large sample distribu¬ 
tion of the M.L.E. @ of a parameter 0 in the density function f(x ; 9) is approxi¬ 
mately normal about 0 as mean. In this situation approximate confidence 





116 


CONFIDENCE INTERVALS AND REGIONS 


10.4 


intervals may be simply constructed. The method is to convert an inequality 
of the form 

(ia20) 

for the distribution of 9 expressed in standard measure, to an inequality for 
6 itself. We recall that a is defined by 

1 f«« 

exp = 1 - 2a. (10.21) 

We will illustrate the method by applying it to the binomial distribution. 

Example 10.3. If we apply Eqn (7.27) to the binomial distribution of 
Eqn (4.29) we find 

var@ = @ 2 = — - ~ — . (10.22) 

From (10.20) an approximate (1 — 2a) confidence interval is obtained by 
considering the statement 

<10 ' 23) 

which may be rewritten as 

P^P ~ ~ P < P + ^ 1 - 2a. (10.24) 

Thus, for example, 

- l-96 ^ (1 ~ ^ ^p^p + l-96 ^ (1 ~ ^ ~ 0-95 

gives an approximate 95 % confidence interval for p. 

The above method may be extended to confidence regions by using 
Theorem 7.5 in place of Theorem 7.2. Thus, in terms of the matrix M t j of 
Eqn (7.34), we know that 

X 2 =i i 0i- OdM'jtfj - dj), (10.25) 

i =* 1 j—1 

is approximately distributed as x 2 with p degrees of freedom. So, just as 
we used (10.21) for the normal distribution, we can use the a percentage 
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points of the x 2 distribution to set up a confidence region for the parameters 
0i . It is an ellipsoid with centre at 0 2 , •••> $ P )- 

At the end of Section 10.3 we remarked that it was not possible, in general, 
to obtain a confidence region for a subset of the p parameters for samples 
of arbitrary size. However, for large samples this is possible. If we wish to 
construct a region for a subset of r parameters (r < p) then the elements of 
the matrix M t / analogous to M lV above are given by 

= (7V 1 . < 10 * 26 ) 

where the matrix V' is obtained by removing the last (p — r) rows and 
columns in F fJ *. The quadratic form 

z' 2 = i i 0i - OdMtj'Qj - Oj), (10.27) 

i=lJ=1 

is then approximately distributed as x 2 with r degrees of freedom, and will 
define an ellipsoid in the 6 t (i = 1, r) space. 



11 Hypothesis Testing 


11.1 Introduction 

In Chapters 7-10 we have considered one of the two main branches of 
statistical inference, that of estimation. The other branch remaining to be 
discussed is that of hypothesis testing. We will start by defining what we 
mean by a statistical hypothesis. 

Definition 11.1. Consider a set of random variables x u x 2 ,..., x „, defining 
a sample space S of n dimensions. If we denote a general point in the sample 
space by E then, if R is any region in S, any hypothesis concerning the 
probability that E falls in R, i.e. P{E e R) is called a statistical hypothesis . 
Furthermore, if the hypothesis determines P(EeR) completely then it is 
called simple , otherwise it is called composite. 

As an example, in testing the significance of the mean of a sample, it is 
a statistical hypothesis that the parent population is normal. Furthermore, 
if the parent population is postulated to have mean p and variance a 2 then 
the hypothesis is simple, because the density function is then completely 
determined. 

We have already met the testing of an hypothesis when we discussed the 
use of the x 2 and F distributions to determine the number of parameters 
needed for a satisfactory fit in the least-squares method of Chapter 8. The x 2 
distribution can also be used to test the compatibility of repeated measure¬ 
ments of a quantity. 

Example 11.1. If two experiments give the following results for the value 
of a parameter (assumed normal), 2*05 ± 0*01 and 2 09 ± 0*02. what can 
one say about their compatability? 

The weighted mean of these results is x = 2*056 and thus x 2 = 3*25. 
Since we have estimated x from the data this value of x 2 is for one degree 
of freedom, and so from the tables we find 

P(x 2 ( 1) > 3-25) » 7%. 

Thus, a spread of this type is expected in approximately 7 % of such measure¬ 
ments. 
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The above compatibility test is frequently used on data. More formally, 
what we are testing is the statistical hypothesis that the measurements both 
come from the same normal distribution. 

The general procedure for testing an hypothesis is as follows. Assuming 
the hypothesis to be true, we can find a region R in the sample space S such 
that the probability of E falling in R is any preassigned value a, called the 
significance level. The region (S - R) is called the region of acceptance, and 
R is called the critical region, or region of rejection. If the observed event E 
falls in R we reject the hypothesis, otherwise we would accept it. In practice, 
as we shall discuss below, the critical region is determined by a statistic, the 
nature of which depends upon the hypothesis to be tested. The hypothesis 
under test is usually called the null hypothesis. 

By analogy with the discussion, in Chapter 10, on confidence intervals, 
there are many possible acceptance regions for a given hypothesis at a given 
significance level a. For all of them the hypothesis will be rejected, although 
true, in some cases. Such mistakes are called Type I errors and their prob¬ 
ability, denoted by P[I] is equal to the significance level of the test. It is 
also possible that even though the hypothesis is false we fail to reject it. 
This is called a Type II error. We are led to the following definition of 
error probabilities. 

Definition 11.2. Let two hypothesis be H t : 9e9 1 and II 2 ■ 0 e 0 2 , where 
0 1 and 0 2 are two mutually exclusive and exhaustive regions of the para¬ 
meter space. Further, let S t and S 2 be the acceptance and critical regions 
of the sample space S associated with the event E = (xr 1( x 2 , •••, x n ), 
assuming H 1 to be true. Then the probability of a Type I error is 

P[I] =PlEeS 2 \H l :9e9 1 ], (11.1) 

and, ifH 1 is false, but H 2 is true, the probability of a Type II error is 

P[II] = PlEeS 1 \H 2 :9e9 2 ]. (11.2) 

From Eqns (11.1) and (11.2) we can define a quantity which may be used 
to compare the relative merits of two tests. 


Definition 11.3. The power of a statistical test is defined as 


p{6) = P[EeS 2 \H 2 :9e9 2 ] 

= l-P[£eS 1 |H 2 :060 2 ], 


(11.3) 
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and is the probability of rejecting the hypothesis when it is, in fact, false. 
From (11.1) and (11.2) it follows that 

(PI I], 9e9, 

P(9) = I (11.4) 

U-F[II], 9e9 2 . 


11.2 General hypotheses: likelihood ratios 

There are many techniques which have been devised to test statistical 
hypothesis. For example, in Section 6.1 we encountered the chi-square 
distribution which may be used as a “goodness-to-fit” test, i.e. may be used 
to test the hypothesis that a sample comes from a particular parent popula¬ 
tion. In this subsection we shall concentrate on tests based on likelihood 
ratios and we will start by discussing the simplest of all possible situations. 

11.2.1. Simple Hypothesis: One Simple Alternative 

This case is not very useful in practice but will serve as an introduction 
to the general methods. Firstly, we will define the likelihood ratio test. 


Definition 11.4. A likelihood ratio test is used to decide between a simple 
null hypothesis H 0 :9 = 6 0 and the simple alternative H a :9 = 9 a . If we 
define the likelihood ratio X for a sample of size n by 


n/(*iA) 

X = _ 

ri/(*,;0J 

i = l 


L(9 0 ) 
L(9 a ) ’ 


(11.5) 


then, for a fixed k, the test is 

for X > k, accept H 0 , 
for X < k, reject H 0 , 

and for X = k, either action is taken. 

The use of this test is illustrated by the following example. 

Example 11.2. We will test the null hypothesis H 0 :9 = 2 against the 
alternative H a : 9 = 0 for the normal population with unit variance 

1 

(2 


/(*;0) = 


exp [— (x — 0) 2 /2], 
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From (11.5) 

A — exp [2 nx — 2 ri], 
and thus, from Definition 11.4, H 0 is accepted if 

A > k, 

i.e. if 

In A; 


x > c = — -h 1, 

2 n 


and rejected if 


x < c. 


( 11 . 6 ) 


(11.7) 



Fig. 11.1. Error probabilities for Example 11.2. 


In this simple example the error probabilities are given by the shaded 
areas in Fig. (11.1). To find the point for which P[I] is a given value for a 
fixed value of n we note that when 6 — 2 

P[I] = P[x < c\6 = 2] - a, 

so, for a = 0 05 and n = 4 (say), using the tables in Appendix D, gives 
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P[II] = P[x > 1-1775 |0 = 0] = 0 009. 

It is also possible to find the sample size necessary to obtain fixed values of 
P[I] and P[I1]. Thus, for example, if P[I] = 0 02 and P[II] = 0 01 we re¬ 
quire that 


and 


c — $o 


2-054 

y/n 


2-054 
y/n ’ 


c = 9 a + 


2-323 


2-323 


simultaneously. This gives n = 4-8, and so a sample size of 5 would suffice. 

The likelihood ratio test as used in the above example concentrates on 
controlling only Type I errors. A better test is one which, for a null hypo¬ 
theses H 0 :9e6 0 with an alternative H a ; 6e6 a gives 


P[I] < a, for all 6e0 o , 
and maximizes the power 

0(0) = 1 - P[II], for all 6eO a . 

For the case of a simple null hypothesis and a simple alternative such a test 
is provided by the following theorem. 


Theorem 11.1. The critical region R k which, for a fixed significance level a, 
maximizes the power of the test of the null hypothesis H o :0 = 6 0 against the 
alternative H a \0~ 0 a9 where x l9 ..., x n is a sample of size n from a density 
f(x; 6 ), is that region for which the likelihood ratio 

(1U) 

for a fixed number k, and 

f • • f n /Oi; °o)dXi = a. (11.9) 

J Rk J i= 1 

This result is known as the Neyman-Pearson Lemma. 
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Proof. Our object is to find the region R which maximizes the power 

P= j* dxL{6 a ), 

subject to the condition implied by Eqn (11.9), i.e. 


L 


dxL(6 0 ) = a. 


Consider the region R k defined to be that where the likelihood ratio 

L(0 0 ) 


X = 


ne a ) 


< k. 


In R k it follows that 


f dxL(O a ) > -)- f dxLi0 o ). 
J Rk k. jRk 


But for all regions Eqn (11.9) must hold, and so we have, for any region R 

f dxL(9 a ) dxL(6 0 ). 

jR k k Jr 


Now for a region R outside R k 


L«W 

> k. 

L(0 a ) 

and hence 


- 7 - (* dxL(d 0 ) > 

f dxL(9 a ). 

k Jr 

Jr 

Combining these two inequalities gives 



J 1 


dxL(6 a ) 



which is true for any R, and all R k such that X < k. Thus R k is the required 
critical region. 

We have seen in Example 11.2 that the critical region for testing the null 
hypothesis H 0 : 9 = 0 against the alternative H a : 6 = 2 is given by 
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Now if H 0 is true the distribution of x is normal with zero mean. Thus from 
Eqn (11.9) with a = 0 05, 


giving c = 1-645/(»)*. Thus H 0 is to be rejected if x > 1 -645/(«)+. 

We will illustrate the use of the Neyman-Pearson Lemma by another 
example, this time involving the Cauchy density of Section 4.4. 


Example 11.3. Consider the Cauchy distribution 


f(x; 6) = — • 
n 


1 

1 + (x - 0) 2 ' 


We will determine the critical region for a sample size one for testing the 
null hypothesis H 0 ;6 = 0, against the alternative H a ;6 = 2, defined by a 
likelihood ratio 2 = 5. 

Now we have, from the definition of a likelihood ratio, 


^ _ L(0 = 0) _ 1 + (x — 2) 2 
“ L(0 = 2) ~ 1 + x 2 

and we require 2 < 5. This implies that the critical regions for the test are 


x > 0 and x < — 1. 


Furthermore, from Theorem 11.1, the significance level of the test is 


a = P[I] 


-/"■ 


/; 


dxL(0 = 0) + dxL(d = 0) 


If" 1 , 1 If”, 1 

= — dx ——j + — dx- -j = 0-75. 

n J- oo 1 + x 2 n Jo 1 + x 2 

Finally, we may find the probability of a Type II error. From Definition 
11.2 this is 


PL ii] 



dx 


1 

1 + (x - 2) 2 


0045, 


and thus the power of the test is 


P = 1 - P[II] = 0-955. 
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11.2.2. Composite Hypotheses 

The cases considered in Section 11.2.1 above are really only useful for 
illustrative purposes. More realistic situations usually involve composite 
hypotheses. The first situation which we will consider is when the null 
hypothesis is simple and the alternative is composite, but may be regarded 
as an aggregate of simple hypotheses. If the alternative is H a , then for each 
of the simple hypotheses in H a , say H a ', we may construct, for a given a, 
a region R for testing H 0 against HJ. However R will vary from one value 
of H a ' to the next and we are, therefore, faced with the problem of deter¬ 
mining the best critical region for all the hypotheses H a '. Such a region is 
called the uniformly most powerful (U.M.P.), and a U.M.P. test is defined 
as follows. 

Definition 11.5. A test of the null hypothesis H o :0e6 o against the 
alternative H a :6e9 a , is called a uniformly most powerful (U.M.P.) test at 
the significance level a if the critical region of the test is such that 

P[I] ^ a for all 6e9 0 , 

P(9) — 1 — P[H] is a maximum for each 9 e9 a . 

The following simple example will illustrate how such a U.M.P. test may be 
constructed. 


Example 11.4. We will test the null hypothesis H 0 : p = p 0 against the 
alternative H a : p> /x 0 , for a normal distribution with unit variance. The 
hypothesis H a may be regarded as an aggregate of hypotheses HJ of the 
form Hf: p — p a where p a > p 0 . The likelihood ratio for testing H 0 against 
H a f is 

A = exp{-i[2/jx(/i„ - Ho) + n(n 0 2 - //„ 2 )]}. 


Theorem 11.1 may now be applied for a given k, and gives the critical 
region as 


x > c = 


- In k 
n(p a - Vo) 


+ K/^0 + Va)- 


Thus the critical region is of the form x > c regardless of the value of p a 
provided p a > p 0 . Thus to reject H 0 if x > c tests H 0 against H a : p > p 0 . 
The number c may be found from 

P[I] = a = || J exp |^ - y (3c - Hof 
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by substituting u = («)*(* — n 0 ). Thus, for example, if a = 0 05, c = 
1-645/(«)* + n 0 . 

A more complicated situation which can occur is in testing one composite 
hypothesis against another, e.g. in testing the null hypothesis H 0 :6 l < 0 < d 2 
against H a :0 <0 1 ,0> 0 2 . In such cases a U.M.P. test does not exist, and 
other tests must be devised whose power is not too inferior to the maximum- 
power tests. A useful method is to construct a test having desirable large- 
sample properties and hope that it is still reasonable for small samples. 
One such test is the generalized likelihood ratio described below. 


Definition 11.6. Let x 1 , ..., A'„ be a sample of size n from a population 
density/(x; 0!, ..., 6 p ) where S is the parameter space. Let the null hypothesis 
be H 0 : (0 lt 0 2 , ...,0 p )eR, and the alternative be H a : (9 U 9 2 , ...,9 p )e(S - R). 
Then, if the likelihood of the sample is denoted by L(S), and its maximum 
value with respect to the parameters in the region S denoted by L(§), the 
generalized likelihood ratio is given by 


L(R) 

L(§) ’ 


( 11 . 10 ) 


and 0 < X < 1. Furthermore, if P[I] = a, then the critical region for the 
generalized likelihood ratio test is 0 < X < A where 



)dX — (x, 


and g(X\H 0 ) is the density of X when the null hypothesis H 0 is true. 
We will illustrate the general method by an example. 


Example 11.5. We shall test the null hypothesis H 0 : g = 3 against H a : 
g ¥* 3, for the normal density with unit variance. In this example the region 
R is a single point g = 3, and ( S — R) is the rest of the real axis. The likeli¬ 
hood is 

= (^) e*P [~i - x) 1 -y(* - (•)*], (11.11) 


and the maximum value of L(S ) is obtained when p = x, i.e. 
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Similarly, 

L(ft) = (^-)" /2 exp [-± i (*, ~ x) 2 - j (x - 3) 2 ], (11.13) 

and so the likelihood ratio is 

A = exp[-y(x-3) 2 ]. (11.14) 

If we use a = 0-05, the critical region for the test is given by 0 < A < A 
where 

J%(A| H 0 )dX = 005. 

Now if H 0 is true, x is normally distributed with mean 3 and variance 1/m, 
and n(x — 3) 2 is then distributed as chi-square with one degree of freedom. 
From (11.14) it follows that (-2 In A) is also distributed as chi-square with 
one degree of freedom. Setting y 2 = -2 In A, and using the tables then gives 

005 = [ A g(X\H 0 )dX = f°° f(x 2 \ l)dy 2 

Jo J-21nA 


= /(x 2 ; iMx 2 - 

J 3 * 84 - 

Thus the critical region is defined by — 2 In A > 3-84, i.e. 

(x — 3) 2 m > 3-84, 


x > 3 + 


(11.15) 


\ ,9 / \-/ 

y/n y/n 

We mentioned above that the generalized likelihood ratio test has useful 
large-sample properties. This can be stated in the following theorem. 

Theorem 11.2. Let x u x 2 , ..., x n be a random sample of size n drawn from 
a density f(x; 9 U ..., 6 P ). Further , let the null hypothesis be 


HoiOi^Oi, i = 1, 2, ..., k < p, 

with the alternative 

H a : 0, ^ B t . 

Then , when H 0 is true , —2 In A is approximately distributed as chi-square 
with k degrees of freedom , if n is large . 
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To use this theorem to test the null hypothesis H 0 with P[I] = a we need 
only compute —2 In A from the sample, and compare it with the a level of 
the chi-square distribution. If -2 In A exceeds the a level H 0 is rejected, 
if not H 0 is accepted. 

11.2.3. Fitting and Comparing Distributions 

In Section 9.1 we introduced the method of estimation known as “mini¬ 
mum chi-square ’, and at the beginning of the chapter we briefly discussed 
how the same technique could be used to test the compatibility of repeated 
measurements. In the latter applications we are testing the statistical hypo¬ 
theses that estimates produced by the measurement process all come from 
the same populations. Such procedures, for obvious reasons, are known as 
goodness-of-fit tests , and we shall discuss them further in this section. 

Consider, firstly, the case of a discrete random variable x which can take 
on a finite number of values x t (i = 1, 2, ..., k) with corresponding prob¬ 
abilities Pi(i — 1,2,..., k ). We will test the null hypothesis 

• Pi tti , i — 1, 2, ..., k, 

against the alternative 

H a •* Pi # n h 

where the n t are specified fixed values, and 

k 

Z n i = l - 

i — 1 

The likelihood function for a sample of size n is 

L (P) = FI Pi fi > 

i=l 

where f =f(x) is the observed frequency of the value jq. The maximum 
value of L(p) if H 0 is true is just 

k 

maxL(p) = L(n) = jfj ^ fi . (11.16) 

i — 1 

To find the maximum value of L(p) if H a is true we need to know the 
M.L.E. of p, i.e. p. Thus we have to maximize 

lnL(p) = X/fhip,, 
i = 1 
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subject to the constraint 

k 

Z Pi = !• 

i= 1 

Introducing the Lagrange multiplier A, the variation function becomes 
P = lnL(p) - A Pi - lj, 


and setting 


gives 


Now since 


dP 


= 0 , 


8pj 

Pj=fjl A- 


ft 1 k 

1 A J =1 


we have that the required M.L.E. is 

Pj =/i/«- 

Thus the maximum value of L(p) if H a is true is 




and the likelihood ratio is therefore, from (11.16) and (11.17), 

L(n) Jl / 7T..\fi 


A - 


L(P) 


-*my 


(11.17) 


(11.18) 


Finally, H a is rejected if k < l c , where X c is a given fixed value of 2, and H a 
is accepted if A > X c . 

Example 11.6. A die is thrown 60 times to test whether it is “true”, and 
the resulting frequencies of the faces are as shown in the table. 


Face 

1 

2 

3 

4 

5 

6 

Frequency 

9 

8 

12 

11 

6 

14 
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11.2 


— 2 In A = -2 


n In n + ]T f In 


( t ) 


where 


Tii = 1/6; n = 60; and k = 6. 


Thus, 


— 2 In 2 = — 2[60 In 60 - 601n6 - 91n9... - 141nl4] 

- 4-3. 

From Theorem 11.2 the quantity —2 In X is approximately distributed as 
X 2 with 5 degrees of freedom. (Note that there are only 5 degrees of freedom, 
not 6, because of the constraint Jlp i — 1). From tables of the x 2 distribution 
we find e.g., 


Xoi 2 (5) = 91, 

and so we can certainly accept the hypothesis that the die is true, i.e. 

H 0 :pi = 1/6, i — l,2 ..., 6, 


at the 10% level. 

For the case of continuous distributions the null hypothesis is usually 
that a population is described by a certain density function f(x). This hypo¬ 
thesis may be tested approximately by dividing the observations into k 
intervals and then using the method described above to compare the observed 
interval frequencies with those predicted by the postulated density function 
f{x). Then, if f is the frequency found in the ith interval, it can be shown 
that 


x 2 = .Z 


(/I - » rc.-) 2 

JlTti 


is asympototically distributed as yf with k — 1 degrees of freedom. If n t is 
unknown, but is estimated from the sample in terms of r parameters to be 
ft t then the statistic 

2 _ * jf - nn,) 2 
X j= i nft t ’ 


is also distributed as x 2 but now with k — 1 — r degrees of freedom. 
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In using the chi-square test of a continuous distribution with unknown 
parameters one has always to be careful that the method of estimating the 
parameters still leads to an asymptotic x 2 distribution. In general, this will 
not be true if the parameters are estimated either from the original data 
or from the grouped data. The correct procedure is to estimate the para¬ 
meters 0 by the M.L. method using the likelihood function 

l( 8)=n emm)]", 

i= 1 

where p t is the appropriate density function. Such estimates are usually 
difficult to obtain, but if one uses the simple estimates then one may be 
working at a higher significance level than intended. This will happen, for 
example, in the case of the normal distribution. 

11.3 Normal distribution 

Because of the great importance of the normal distribution we shall give, 
in this section, some more details concerning tests involving this distribution. 

11.3.1. Introduction 

We will begin by considering the normal distribution for the situation 
where the population variance is known. This, of course, is not a very 
practical example but will serve as an introduction to more realistic cases 
which we will consider later. 

We will thus assume that we have a normal population with density 
n(x; p, a 2 ) where the mean is unknown, but the variance is known. For a 
sample of size n drawn from the population we may compute the sample 
mean x, and we have previously seen that 

E[x] = p, and var (x) = a 2 In. 

Furthermore, by the Central Limit Theorem, the variable 



has a density n(Ff; 0; 1). We may now ask the question: what is the proba¬ 
bility that | IT|, as calculated from the sample, is greater than some specified 
value W y , where W y is defined by 

P[W > W y 2 = y. 


(11.19) 
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Now, from the definition of the probability distribution function, we 
have that 


PL\W | ^ Wy] 


- {ky as j “"‘ n+ ■") 

= 2[1 — 0,1)], 

and from Eqn (11.19) this is equal to 2y. Substituting for W then gives the 
result 


’[/* 


aW y 

■Jn 


^ x ^ ju + 


<tW v 


y/n J 


1 = 


2N(W y ; 0,1) — 1. (11.20) 


This inequality may either be looked upon as establishing a confidence 
interval for n, or as forming the basis of a test of the hypothesis. 

H 0 : n = n 0 ; f(x) = n(x; n, a 2 ), 
against the alternative 

H a \ n ¥> n 0 ; fix) = «(x; fi, a 2 ). 


If we consider the latter possibility then we would be led to reject the null 
hypothesis if the calculated quantity 

w ‘-{ (n - 2i > 

is greater than W y in modulus, i.e. reject H 0 if \W 0 \ > W r 
The above is a typical example of a two-tailed test , so-called because in 
such tests the probability of a Type I error 


PUi = a = 2y, 


is the sum of the areas in the two tail of the normal distribution. If the 
alternative hypothesis was 


H a :n>fi 0 ; f{x) = n(x-,n,o 2 ), 

then P[I] is the area under only one of the tails of the distribution, and the 
significance level of the test is thus 

Pm = a = y. 

Such a test is called a one-tailed test . 
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Having fixed P[I] we now have to consider the probability of a Type II 
error. This will, of course, generally depend on the alternative hypothesis, 
so for definiteness we will consider the above null hypothesis with the 
specific alternative 

H a : n = n a ‘, /(*) = n(x; n, a 2 ). 

Our aim is thus to find the probability that the test statistic W 0 falls within 
the acceptance region if the alternative hypothesis is true. We will write this as 

Pl-W',2 < W 0 < W^.HJ, 

and in terms of the power, defined in Eqn (11.3), 

P[— W a/2 < W 0 < W' l2 : HJ = 1 -p. (11.22) 

By analogy with the definition of W 0 we shall define 



and if H a is true then W a has a normal distribution. Furthermore, since 


W 0 =W a + 


Pa ~ PO 
aj^jn 


(11.23) 


we may construct an inequality for W a by substituting (11.23) into (11.22). 
This gives 


1 - 



Pa ~ PO < 
a/y/n 


W.< 


Wa/ 2 


Pa ~ ^O l 

er ly/n J‘ 


(11.24) 


Equation (11.24) shows that if n a - n 0 is sma11 then 


P[II] k 1 - P[I], 

and hence 

P\~ a. 

Thus the power of the test will be very low. This situation can only be 
improved by making fi a — /< 0 large, or by having n large. This is in accord 
with the common-sense view that it is difficult to distinguish between two 
close alternatives without a large quantity of data. The situation is illustrated 
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in Fig. (11.2) where we have plotted the power /? against the auxilliary 
parameter 



for some sample values of a, the significance level. 



Fig. 11.2. The power of a test comparing two means for a normal population with known 
variance. 

Example 11.7. Four measurements of a quantity give values 
112; M3; MO; 109. 


If they all come from normal populations with a 2 = 4 x 10“ 4 we can apply 
the x 2 goodness-of-fit test to see if it is a reasonable assumption that the 
populations all have the same mean, i.e. are identical. The sample mean is 

x = Ml, 


and so 


l 2 = -n- £ O “ */) 2 = 2-5, 
a i=1 


and this value is for three degrees of freedom. From the tables we have 

F[X 2 (3) 2 * 2-5] ^ 0-47, 
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and so it is very reasonable to assume that the measurements all came from 
the same normal population. We can now test the hypothesis that the 
common mean is some particular value, e.g. 

H 0 : n = Vo = 1*09, 


against the alternative 


H a ; n ± 109. 

Proceeding as outlined above we calculate the test statistic 

x - Vo 


m = 


<7/V« 


= 2 * 0 . 


Thus, for a two-tailed test at a 5 % significance level, i.e. a = 0 05, W a/2 = 1*96, 
and since 

m > w a/29 

the null hypothesis can be rejected. A 95 % confidence interval for v * s th en 


*W. 


a/2 


< ^ < X + 


°W a 


a/2 


i.e. 


y/n ^ ^ ^ ' Jn ’ 

1 090 ^ v < 1*130. 

If the alternative hypothesis was 

H a : v = Va = Ml 


then the auxiliary parameter A is 

A -^nr - 2 

and so for a = 0 05, /? ~ 0-5, i.e. if only the two possibilities for \i had 
been available, and if the test statistic fell inside the acceptance region, i.e. 
we accepted H 0 , then the probability of having made an incorrect decision 
would have been ~ 50%. 

We are now in a position to review the general procedure followed to 
test an hypothesis. 

(i) State the null hypothesis H 0 , and its alternative H a . 

(ii) Specify P[I] and P[II], the probabilities for errors of Types I and II, 
respectively, and compute the necessary sample size n. In practice 



136 


HYPOTHESIS TESTING 


11.3 


P[I] = a and n are commonly given. However, since even a relatively 
small P[II] is usually of practical importance, a check should always 
be made to ensure that the values of a and n used lead to a suitable 
P[II]. Tables for this purpose applying to some of the test we will 
consider are given in Appendix D. 

(iii) Choose a test statistic and determine the critical region for the test. 

(iv) Accept or reject the null hypothesis H 0 depending on whether or not 
the value obtained for the sample statistic falls inside or outside the 
critical region. 

A graphical interpretation of the above scheme is shown in Fig. 11.3. 
The curve f 0 (t\H 0 ) is the density function of the test statistic t if H 0 is true 
and f a (t\H a ) is its density function if H a is true. The hypothesis H 0 is rejected 
if t > t a9 and H a is rejected if t < t a . The probabilities of the errors of Types 
I and II are also shown. 

It is, perhaps, worth emphasizing that failure to reject an hypothesis does 
not necessarily mean that the hypothesis is true. However, if we can, on the 
basis of the test, reject the hypothesis then we can say that there is experi¬ 
mental evidence against it. 



ESfl P[U3 =1-/3 I E2 PCI] =a 

Fig. 11.3. Graphical representation of a general hypothesis test. 


11.3.2. Specific Tests 

We shall now turn to the more practical case where the variance is unknown, 
and use the general procedure outlined at the end of Section 11.3.1 to 
establish some commonly used tests 

(a) Test of whether the mean is different from some specified value . 

The null hypothesis in this case is 

H 0 : p = p 0 , 0 < a 2 < oo, 
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and the alternative is 

H a : fi* Ho, 0 < <r 2 < co. 


The parameter space is 


S = {- oo < \i < oo; 0 < a 2 < oo}, 
and the acceptance region associated with the null hypothesis is 
R = {n = ^ 0 ; 0 < o 2 < oo}. 

The variance a 2 is now unknown, but for large samples we could use an 
estimate s 2 , and then take over the results of Section 11.3.1. However, for 
small samples this procedure could lead to gross errors and so we must 
devise a test where <r 2 is not explicitly used. Such a test is based on the use 
of the Student t distribution which was discussed in Chapter 6. We shall 
derive the test as follows. 

The likelihood function for a sample of size n drawn from the population 
is given by 


L = 


( 2n) nJ2 


>phi(^r]. 


The M.L.E.’s of n and a 2 are 


(11.25) 


and 


1 " 

fi = — X x t = x, 
rt j=i 

£2 = ± f (*. _ *) 2 . (11-26) 

n i = i 


Using (11.26) in (11.25) gives 


L ^ _2nl,(x i - x) 2 _ 


nil 


*-nt2 


To maximize L in K we set fi = /i 0 giving 


L = 


(dy^ exp 




(11.27) 
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* 2 =|i (Xi-Ho) 2 , 
n i= i 

and^hence 


L(R) = 


n 1 "> 2 

_2n'L(x i - fi 0 ) 2 _ 


e~ n/2 . 


(11.28) 


From (11.27) and (11.28) the generalized likelihood-ratio is 


f S(* f -s ) 2 
1 ~ Ho) 2 i 


(11.29) 


We must now find the distribution of X if H 0 is true. Rewriting (11.29) gives 


where 


r,, «(* - Ho ) 2 1 ~ n/2 
[ l(x t - X ) 2 J 


f n(n - 1) t _ 

= <* - ^ 


(11.30) 


is distributed as the /-distribution with (n - 1) degrees of freedom. Thus, 
from (11.30) a critical region of the form 0 < X < A is equivalent to the 
region t 2 > F(A). Thus a significance level of a corresponds to the pair of 
intervals 


t < ~ 4/2 and t > t„ /2 , 

where 

f 1 )dt = a/2, 

Jt *h 

and /(/; n — 1) is the /-distribution with (n — 1) degrees of freedom. If t 
lies between — t x/2 and t a/2 , H 0 is accepted, otherwise it is rejected. This is 
a typical example of a two-tailed test, and is exactly equivalent to con¬ 
structing a 100(1 — 2a) % confidence interval for n, and accepting H 0 if n lies 
within it. 
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The above result may be summarised as follows: 


Observations 
Significance level 
Null hypothesis 
Alternative hypothesis 

Test statistic 


n values of x 

OL 

H 0 : p = p 0 
H a : p Po 


r n(n-l) 

" .Sfe-x) 2 ] ( * ^ o) ’ 


obeys a ^-distribution with n — 1 degrees of freedom if null hypothesis is 
true. 

Decision criterion. If observed value of t lies between — t a/2 and f a /2 where 


f 1 )dt = a/2 

accept null hypothesis, otherwise reject it. 

The above may be generalized in an obvious way to test the null hypo¬ 
thesis against H a : p> p 0 and H a : p < p 0 . The test statistic is the same, 
but the critical regions are now t < t a and t > t l _ a , respectively. 

The above procedure, by specifying the significance level, has controlled 
Type I errors. We must now consider the power of the test. This is no longer 
a simple problem because if H 0 is not true then the statistic t no longer 
obeys a Student t distribution. However, if the alternative hypothesis is 

H a : n = n a , 0 < a 2 < oo, 


then, if H a is true, it can be shown that t obeys a non-central t-distribution 
of the form 


2 -(v- 1)/2 / ,2x_ (v + i)/2 r / S 2 \1 

+ ' xp [- 1 UT7>)J 




where 


v = n — 1, 

and 

g = I fig ~ Mol 

ojs/n 
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Unfortunately, this distribution (apart from being very complex!) contains 
the population variance o 2 in the non-centrality parameter 6. An estimate 
of the power of the test may be obtained by replacing o 2 by the sample 
variance s 2 in the non-central distribution and then using tables of the 
distribution. 

Non-central distributions arise typically if we wish to consider the power 
of a test, and generally are functions of a non-centrality parameter, which 
itself is a function of the alternative hypothesis and a population parameter 
which is unknown. 

Another use of the student t distribution is contained in the following 
test, which we will state without proof. 

(b) Test of whether the means of two populations having the same variance 
differ . 

Observations m values of x x and n values of x 2 . 

Significance level a 

Null hypothesis H 0 : p x = p 2 

Alternative hypothesis H a : p x # p 2 

Test statistic 



mn/(m + n) 

t = (*1 - *2) ' 

^ [x Oli - *i ) 2 + X (*2 J - *2) 2 j j(rn + n - 2 ) 


obeys a /-distribution with m + n — 2 degrees of freedom if null hypothesis 
is true. 

Decision criterion . If observed value of t lies in the tange t x _ a/2 < t < t a/2 
where 


/(/; m -f n — 2)dt = a/2, 
h 


(11.31) 


accept null hypothesis, otherwise reject it. 

We will now pass on to consider some tests associated with the variance 
of a normal population, and, by analogy with the discussion of tests involv¬ 
ing the mean, we shall start by a test of whether the variance is equal to 
some specified value. 


(c) Test of whether the variance is equal to some specified value. 
The null hypothesis in this case is 


H 0 :a 2 = <7 0 2 , 


— 00 < p < 00, 
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and the alternative is 


H a : a 2 # <r 0 2 , - oo < /r < oo. 

The parameter space is 


S ={- oo < fi < co; 0 < cr 2 < oo}, 

and the acceptance region associated with the null hypothesis is 


R = {- ooc/jcoo; <x 2 = <7 0 2 }- 


The mean is not assumed to be known. The test will involve the use of the 
X 2 distribution, which we discussed in Chapter 6, and we will derive it by 
the method of likelihood ratios. 

The likelihood function for a sample of size n drawn from the population 
is given by 


In the acceptance region R we have 


This expression is a maximum when 

l(Xi - n ) 2 

is a minimum, i.e. when fi = x, the arithmetic mean of the sample. Thus 


i.e. 




where s 2 is the sample variance. To maximize L in S we have to solve the 
M.L. equations. The solutions have been given in Eqns (11.26) and hence 


L(S) = 


/ i y 2 1 / n y 2 

\2 7i ) s n \n — 1/ 


exp (— n/ 2). 
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We may now form the generalized likelihood ratio 


X = 


L(R) 

L(§) 


(n - 1 ) j 2 ]»/2 


no v, 




from which we see that a critical region of the form X < k is equivalent to 
the region 


ki < — < k 2 , 

G o 

where and k 2 are constants depending on n and a, the significance level 
of the test. Now, if H 0 is true then (n — \)s 2 /cr 0 2 obeys a x 2 distribution 
with (n— 1) degrees of freedom and so, in principle, the required values of 
k x and k 2 could be found. However, a good approximation to the optimum 
values is obtained by choosing values of k x and k 2 using equal right and 
left tails of the distribution. Thus we are lead to the following test procedure. 


Observations n values of x 

Significance level a 

Null hypothesis H 0 : o 2 = <j 0 2 

Alternative hypothesis H a : o 2 ^ cr 0 2 

Test statistic 

2 V l X i ~ *\ 2 S 2 / i\ 
Xo 2 = I I—— =—(«- 1), 

i=i \ &o / rr 0 


obeys a x 2 -distribution with n — 1 degrees of freedom if null hypothesis is 
true. 


Decision criterion . If x 2 lies in the interval 

Xl-«/2 2 <X 2 <X*/2 2 , 

where y^ 2 is defined by analogy with Eqn (11.31), accept the hypothesis, 
otherwise reject it. 

Again, this test may be simply adapted to deal with the hypotheses 
H a : a 2 > <7 0 2 and H a : a 2 < <r 0 2 . 

We now have to ask the question: what is the probability of a Type II 
error in this test? If the alternative hypothesis is 


H a : a 2 = a 2 . 
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and if H a is true, then the quantity 

Xa = -72 (« ~ 1 )> 

a o 

will be distributed as ff with n — 1 degrees of freedom. Thus, from the 
definition of the power function, we have 

P = 1 - P [z a /2 2 (« -!)>(«- 1)^2 >Zl-a/2 2 (« - l)j. 
and therefore, 

P = 1 - P [xa/2 2 (« - 1)^2 > (« - l)-^2 

L O'a <*a °a J 

Having fixed the significance level a and the values of a 0 and a a , we can now 
read off from tables the probability that a chi-square variate with (n — 1) 
degrees of freedom lies between the two limits in the square brackets. 

The final test that we will consider concerns the equality of the variances 
of two normal populations. 

(d) Test of whether the variances of two populations differ. 

Observations m values of x u n values of x 2 

Significance level a 

Null hypothesis H 0 : a 2 — off 

Alternative hypothesis H a : off # off 

Test statistic 

_ ("-!)£(*!<-*i) 2 s ^Km - 1) 

F - 1) Z ( X 2j - X l) 2 S 2 2 K n - !) ’ 

j 

obeys the F distribution with m — 1 and n — 1 degrees of freedom if H 0 is 
true. 


Decision criterion. If the sample value of F lies in the range 


_ 1 _ 

F a/2 (n - 1, m - 1) 


< F < F a/2 (m - 1, n - 1), 


where F a/2 is defined by analogy with Eqn (11.31), accept the hypothesis, 
otherwise reject it. 
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To calculate the power of the test, we note that F[II] depends on the value 
of c>i 2 /cr 2 2 . If the true value of this ratio is S then, since (m — l)^! 2 /<r 1 2 for 
a sample from a normal population is distributed as x 2 (m - 1 ), we find that 
s l 2 /s 2 2 is distributed as 

0 * 1 2 

—2 F(m — 1, n — 1) = <5F(m — 1, n — 1). 

Thus, 

P=l- - l, n - 1) < ^ 5 . < F a/2 (m - 1, n - 1)1, 

is equivalent to 

P = 1 _ ^ ^ ^ 

For any given value of <5 these limits may be found from tables of the F 
distribution. It can be shown by consulting these tables that the power of 
the F-test is rather small unless the ratio of variances is very large, a result 
which is in accordance with common-sense. 

11.4 Linear hypotheses 

In Section 8.1.2 we briefly mentioned the use of the x 2 and F distributions 
as goodness-of-fit tests in connection with the use of the linear least-squares 
method of estimation. These applications were designed to test hypotheses 
concerning the quality of the approximation of the observations by some 
assumed expression linear in the parameters. We shall generalize that dis¬ 
cussion now to consider some other hypothesis tests which can be performed 
using the least-squares results. 

11.4.1. Introduction 

We have seen in Section 8.1 that the weighted sum of residuals 

S = R t V - 1 R, 

where 

R = Y - <t>0, 

and V is the variance matrix of the observations, is distributed as x 2 with 
n — p degrees of freedom, where n is the number of observations and p is 
the total number of parameters 6 k (k = 1, We also saw (c/. Eqn (8.28)) 

that 

R r v -i R = (Y — YYV-HY - Y°) - (6 - OfM-'CO - ©), (11.32) 
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where M is the variance matrix of the parameters. It follows from the addi¬ 
tive property of y 2 that since 

(Y - Y°) r V'‘(Y - Y°) 

is distributed as y 2 with n degrees of freedom, the quantity 

(6 - - 0 ), 

is distributed as x 2 with p degrees of freedom. Now to test deviations from 
the best least-squares values for the parameters we need to know the 
distribution of 

(0 - efEH© - ©), 

where E is the error matrix of Eqn (8.30). In the notation of Section 8.1.2 

E = £ 2 (<D T W<P) _1 , 


and so 

(0 - 0) t E -1 (® - ©) 

= \ (0 - 0) t (4> t W<D)(0 - 0), 

6 

i.e. 

(0 - 0) t E - 1 (0 - 0) = (0 - 0) T M -1 (0 - 0) (-^2 

But we have seen above that 

(0 - 0) t M \® - 0 ), 

is distributed as y 1 with /^-degrees of freedom, and so 

— (0 - 0) T E -1 (0 - 0), 

P 


is distributed as 


x\p)[p 

X 2 (n ~ P)l( n - P) 
Thus to test the hypothesis 


= F(p, n - p). 


H 0 : 0 = 0 O , 
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we compute the test statistic 

F 0 =j(&- ©ofE-^O - 0 O ), 
and reject the hypothesis at a significance level of a if 


F o > FJp, n - p). 


11.4.2. General Theory 

In Section 8.2, we consider the least-squares method in the presence of 
linear constraints on the parameters. By analogy we will now generalize the 
discussion in Section 11.4.1 above to include the general linear hypothesis 

H 0 : C lp 6 p = Z„ /</> (11.33) 

which may be an hypothesis about all of the parameters, or any subset of 
them. 

The null hypothesis H 0 may be tested by comparing the least-squares 
solution for the weighted sum of residuals when H 0 is true, i.e. S c , with the 
sum in the unconstrained situation, i.e. S. In the notation of Section 8.2., 
the additional sum of residuals S A = S c — S, which is present if the hypo¬ 
thesis H 0 is true, is distributed as % 2 with / degrees of freedom, indepen¬ 
dently of S, which itself is distributed as % 2 with (n - p) degrees of freedom. 
Thus the ratio 


F _ SJl 
S/(n - p) ’ 

is distributed as F(l, n — p). Using the results of Section 8.2 we can then 
show that 

F = -L (Z - C0) T (CEC T ) “ 1 (Z - C0). (11.34) 

Thus H 0 is rejected at the a significance level if F > F a (l, n — p), (compare 
the discussion at the end of Section 8.1.2). 

Example 11 . 8 . An experiment results in the following estimates for three 
parameters, based on ten measurements 

— 2; @ 2 = 4; @ 3 = 1 , 
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with an associated error matrix 


Test the hypothesis 



H o :0 1 = 0, 0 2 = 0, 


at the 5 % significance level. 

We have, for the above hypothesis, 



and thus the calculated value of F from Eqn (11.34) is 6 . From a set of 
tables we can find that 


F a (l, n — p) = F 0 . 0S (2, 7) = 4-74, 

and so we can reject the hypothesis at this significance level. 

Finally, we have to consider the power of the test of the general linear 
hypothesis, i.e. we have to find the distribution of F if H 0 is not true. Now 
S/(n — p) is distributed as x 2 K n ~ P ) regardless of whether H 0 is true or 
false, but S A jl is only distributed as % 2 // if H 0 is true. If H 0 is false then 
S A /l will in general be distributed as non-central x 2 which has, for / degrees 
of freedom, a density function 

/" c (x 2 ;/) = t + 2/>), 

p =0 \ pl / 

where f(x 2 l 4 - 2 p) is the density function for a x 2 variable with (/ + 2p) 
degrees of freedom, and the non-centrality parameter is 

A = i(CQ - Z) r (CMC r )“ 1 (C© - Z). 

It follows that F is distributed as a non-central F distribution. Tables of the 
latter distribution are available to construct the power curves. A feature of 
the non-central F distribution is that the power of the test increases as k 
increases. 
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1 Matrix algebra 

A matrix is a two-dimensional array of numbers (taken to be real in these 
notes) which is written as 

r #n #12 ••• a ln 

#21 #22 ••• a 2n 

A = ; . . J 

\flml #m2 ••• #mny 

where the general element in the ith row and yth column is denoted by a i} . 
A matrix with m rows and n columns is said to be of order (m x n). 

The transpose of the matrix A is obtained by interchanging the rows and 
columns of A, and is denoted by A r . 

Two matrices may be added if they contain the same number of rows and 
columns, and such addition is both commutative and associative. The (inner) 
product of two matrices is defined only if the number of columns in the 
first matrix is equal to the number of rows in the second. Thus the product 


requires that 


A — BC, 

#iy ^ EW 

k 


Matrix multiplication is not , in general, commutative, but is associative. 
Before we can discuss division of matrices we must consider the special 
properties of square matrices, i.e. those having an equal number of rows and 
columns. 

The determinant of a square (n x n) matrix A is defined as 


det A = |A| = E (± a u a 2j ... a „*), (Al) 


148 



MISCELLANEOUS MATHEMATICS 


149 


where the summation is taken over all permutations of /, j, ..., k where these 
indices are the integers 1, 2, n. The positive sign is used for even permuta¬ 
tions and the negative sign for odd permutations. 

The minor m u of the element a u is defined as the determinant obtained 
from |A| by deleting the /th row and the jth column, and the cofactor of 
is defined as (— \) i+j times the minor m ijt The adjoint matrix is then defined 
as the transposed matrix of cofactors and is denoted by A\ Thus, if 


A = 




If A f = A r the matrix A is said to be orthogonal 

If we form all possible square submatrices of the matrix A (not necessarily 
square), and find that at least one determinant of order r is non-zero, but 
that all determinants of order (r + 1) vanish, then the matrix is said to be of 
rank r. A square matrix of order n with rank r < n has det A = 0 and is 
said to be singular . 

If the square matrix A has elements such that a u = a Jt it is said to be 
symmetric. A particular example of a symmetric matrix is the unit-matrix 1 
with elements equal to unity for i = y, and zero otherwise. 

If the square matrix A, of order n , has rank r = n then it is non-singular 
and there exists a matrix B = A -1 , known as the inverse matrix, such that 


AA" 1 = A -1 A = 1. 


This is clearly the equivalent process to division in scalar algebra. The in¬ 
verse is given by 

A - " 1 = A* |A| ~ x . (A2) 

A few further definitions will be necessary. A square matrix with elements 
a u ^ 0 for i = j only is called diagonal and the unit matrix given above is 
an example of such a matrix. The line on which the elements are non-zero 
is called the principal diagonal. 

A symmetric matrix A is said to be positive definite if for any vector V, 
(i) V r AV ^ 0 and (ii) V T AV = 0 implies V = 0 where 0 is the null vector, 
i.e. a vector consisting of all zeros. 

Finally, for products of matrices, 

(ABC ... D) T = D t ... C r B T A r , 
and, if A, B, C, ..., D are all square non-singular matrices, 

(ABC ... D) 1 = D 1 ... C“ 1 B" 1 A” 1 . 
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A point worth remarking is that in practice Eqns (Al) and (A2) are 
rarely useful for the practical evaluation of the determinant and inverse 
of a matrix. For example, in the least-squares method where the matrix of 
the normal equations (which is positive definite), has to be inverted, the most 
efficient methods in common use are those based either on Choleski’s de¬ 
composition of a positive definite matrix, or on Golub’s factorization by 
orthogonal matrices, the details of which may be found in any modern 
textbook on numerical methods. 

Matrices are frequently used in Chapter 8 to write the set of n linear 
equations in p unknowns 

p 

E a u x J = b i> i ~ l > 2 . n 


in the compact form 


AX = B, 

where A is of order (n x p), X is an (p x 1) column vector and B is a (n x 1) 
column vector. 

The set of n column vectors X ( (i = 1, 2, ..., ri) all of the same order, are 
said to be linearly dependent if there exist n scalars cc t (/ = 1 , 2 , ..., n) 9 not all 
zero, such that 


E a f X, = 0 . 

i~ 1 

If no such set of scalars exists, then the set of vectors is said to be linearly 
independent. It follows that if the vectors are the columns of a square matrix 
S, then if |S| ^ 0 the columns of S are linearly independent. The rank of a 
matrix, defined previously, may thus be expressed as the greatest number of 
linearly independent rows or columns existing in the matrix, and so, for 
example, a non-singular square matrix of order (n x n) must have rank n. 


2 Classical theory of minima 

If f(x) is a function of the single variable x which in a certain interval 
possesses continuous derivatives / (,) (jc) (/ = 1, 2, /i + 1), then Taylor's 
Theorem states that if x and (x + h ) belong to this interval then 

fix + /*)=£ ^rf u \x) + R n , 
j =o Jl 
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where f (0) (x) = f(x), and the remainder term is given by 
h n+1 


R„ = 


-/ (n+1) (x + 0A), O<0<1. 


(» + I)'- 

For a function of /^-variables Taylor’s expansion becomes 


/(x + fh)= X - (hV)V(x) + R„ 
j=oJ'. 


where h is the row vector (hffi 2 ... h p ), \ T is the row vector 


IJLJL JL\ 

\dXi dx 2 dx p / 


and 


+ tt+ 1 


R„ = 


; (hV)" + */(x + 9th), 0 < e < 1. 


(« + 1 )! 

A necessary condition for a turning point (maximum, minimum or saddle 
point) of f(x) to exist is that 

3/(x) 


the,- 


= 0 


for all i = 1, 2, p. A sufficient condition for this point to be a minimum 
is that the second partial derivatives exist, and that D t > 0 for all i — 1,2, 
...,p where 


Di = 


3 2 / 

a 2 / 

d 2 f 

d*! 2 

d 2 / 

5 2 / 

dx x dXi 

d 2 f 


3x 2 2 

.. dx 2 dx i 

s 2 / 

.. ^ 

e 2 f 


dx t dx 2 

dxf 


If we seek a minimum of f(x) subject to the s equality constraints 
e/x) = 0, j — 1,2,..., s, 
then the quantity to consider is the Lagrangian form 


L(x, X) = /(x) + X V/*)» 
i=i 
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where the constants Xj are the so-called Lagrange multipliers . If the first 
partial derivatives of e/x) exist then the required minimum is the uncon¬ 
strained solution of the equations 
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Orthogonal Polynomials 

In Chapter 8, on the linear least-squares method of estimation, we en¬ 
countered the problem of fitting a set of n observations y s (j = 1 , 2 ,..., n) 
in terms of p parameters 9 k {k = 1, 2, ...,/>< ri) by the series 


fj = E O k <j) k (Xj), (Bl) 

fc= 1 

where Xj are the points at which the observations are made. This fitting 
procedure was done by minimizing the weighted sum of residuals 


where 


s = t i r ‘ r J v u l > 


i = 1 J = 1 

r t = y t -fi. 


(B2) 


and V is the variance matrix of the observations. The solution was given by 


© = (® r W<P) -1 O r WY, 


(B3) 


where W = <r 2 \~ 1 is the weight matrix of the observations Y, a 2 is a scale 
factor and 

tyiOO 0i(*i) ••• 

<t>i(x 2 ) <t> 2 (x 2 ) ... <f> p (x 2 ) 

<t> = 

AW AW ••• <t>p(Xn), 


As was remarked in Chapter 8, if fj is chosen to be a power series in Xj 
then the matrix 

E = (<h 7 W<D), (B4) 


is ill-conditioned, and the degree of ill-conditioning increases as n becomes 
larger. Thus, serious rounding errors could occur if <=> is calculated from 
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Eqn (B3). To avoid this, the fitting functions should, if possible, be chosen 
so that E is a diagonal matrix. Such functions are called orthogonal poly¬ 
nomials and we shall discuss them briefly in this Appendix. 

We will assume that the observations are uncorrelated (this is the situation 
frequently met in practice) and denote the diagonal elements of the weight 
matrix by W ( xj) (/ = 1, 2, ..., n). Then if we fit in terms of the polynomials 
(for k = 1 , 2 , ..., p) the matrix of the normal equations is diagonal if 

E W(Xj)\J/ r (Xj)il/ s (xj) = 0, 

j= 1 

for r # s 9 and from Eqn (8.14) the least-squares estimate of 0 is 

E W(Xj)yfl> k (Xj) 

^k = J= r -. k = \,2,...,p. (B5) 

I W(xj)^ k 2 (xj) 

J=1 

Another valuable feature of using orthogonal polynomials is seen if we 
calculate the weighted sum of squared residuals at the minimum. From Eqn 
(8.12) this is, for a fit of order p, 

= w ( x i) \yf - E o k 2 <p k 2 (xj)], (B6) 

and thus if we now perform a fit of order p + 1, S p is reduced by 

S* +1 E ^(^# P+1 2 (^), 

J= 1 

and the first p coefficients 9 k (k = 1 , 2, ...,p) are unchanged. 

To construct the polynomials we will assume that the values of Xj are 
normalized to lie in the interval ( — 1, 1), and since it is also desirable that 
none of the i l/ k (x) has a large absolute value we will arrange that the leading 
coefficient of il/ k (x) is 2 k ~ 2 . In this case it can be shown that the polynomials 
satisfy the following recurrence relations. (The derivation of these relations 
may be found in any modern textbook on numerical analysis). 

<Ai(*) = 1/2 

\p 2 (x) = (2x + PiWtix) 

and for r ^ 2, 

iM]= (2x + p r )il/ r (x) + y^pl/^^x). 
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To calculate the coefficients p and y we apply the orthogonality condition 
to i j/ s and t l/ r+u i.e. 


Z W(Xj)^ s (Xj)\l/ r+1 (Xj) = 0, s * r + 1. (B8) 

j=i 

If we use (B7) in (B8) and set s = j and then j - 1, we are lead immediately 
to the results 


and 


- 2 X W(Xj) x ji// r 2 (Xj) 

Pr — i > r = 1,2,..., 

Z W (. X j)'l / r 2 (Xj) 
j= 1 


- Z W ( X j)'l / r 2 (.Xj) 
. 1=1 _ 

;=i 



Appendix C 

Optimization of Functions of Several Variables 

In Chapters 7 and 8 we encountered the problem of finding the maxima, 
or minima, of nonlinear functions of several variables. These are examples 
of a more general class of problems which are currently the subject of inten¬ 
sive research. Strictly, these problems of optimization, although occurring 
frequently in statistical estimation procedures, are not of a statistical nature, 
and we confine the discussion of this Appendix to the main ideas involved. 
For fuller details one should consult one of the excellent books listed in 
the appropriate section of the bibliography, on which these notes draw 
extensively. 


1 Introduction 

We will consider only minimization problems since 
min/(x) = max [ —/(x)]. 

The general problem to be solved may then be stated as follows. Minimize 
the objective function f(x u x 2 , ..., x p ) = /(x), subject to the m inequality 
constraints 


cfx) ^ 0, i* = 1, 2, ..., m 
and the s equality constraints , 

*y(x) = 0, j = 1,2, ..., s. 

All other constraints can be reduced to either of the above forms by suitable 
transformations. We will discuss firstly a few of the features of methods of 
optimization in general and then, in later sections, discuss in more detail a 
few of the more successful methods in current use. 

Any point which satisfies all the constraints is called feasible , and the 
entire set of such points is called the feasible region. Points lying outside the 
feasible region are said to be non-feasible. Nearly all methods of optimiza¬ 
tion are iterative in the sense that an initial feasible vector x (0) must be 
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specified from which the method will generate a series of vectors x (1) , x (2) , 
x (rt) etc. which represent improved approximations to the solution. The 
exception to this occurs in simple tabulation methods where, e.g. in a two- 
variable problem, a two-dimensional grid of values of f(x u x 2 ) could be 
calculated and scanned for a minimum. Although systematic methods are 
available for such multidimensional grid-searching far more efficient methods 
exist for locating optima, and so we shall not discuss them further. 

It is convenient to express the iterative procedure by the equation 

x (/, + 1) = x (w) 4- h n d n , (Cl) 

where d„ is a /7-dimensional directional vector , and h n is the distance moved 
along it. The basic problem is to determine the most suitable vector d n , 
since once d n is chosen the function /(x) can be calculated and a suitable 
value of h n found. Iterative techniques fall naturally into two classes, (a) 
Direct Search Method, and (b) Gradient Methods. 

Direct search methods are based on a sequential examination of a series 
of trial solutions produced from an initial feasible point. On the basis of the 
examinations, the strategy for further searching is determined. These methods 
are characterized by the fact they only require explicitly values of the objec¬ 
tive function, and, in particular, a knowledge of the derivatives of /(x) is 
not required. The latter fact is both a strength and a weakness of the methods, 
for although in problems involving many variables the calculation of deriva¬ 
tives can be difficult, nevertheless it is clear that more efficient methods 
should be possible if more information (i.e. in the form of the derivatives) 
is supplied. 

In practice direct search methods are most useful for situations involving 
a few parameters, or where the calculation of derivatives is very difficult, 
or for finding promising regions in the parameter space where optima might 
reasonably be located. 

Gradient methods, as their name implies, make explicit use of the partial 
derivatives of the objective function, in addition to values of the function 
itself. The gradient direction at any point is that direction whose components 
are proportional to the first-order partial derivatives of the objective function 
at the point. The importance of this quantity will be seen as follows. If we 
make small perturbations <5x from the current point x then, to first order 

v- <c2) 

To obtain the perturbation giving the greatest change in the function we 
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have to consider the Lagrangian form 


F(x, X) = df + X ( t Sx / ~ A 2 ), (C3) 

where A 2 is the magnitude of the perturbations, i.e. 


A 2 = f Sxf. 

j= i 


Using (Cl) in (C3), and differentiating with respect to dxj gives 


and hence 


Sx 1 

8ffix 1 dfjdx 2 


df 

+ 2 XSxj = 0 , 


Sx 2 8x„ 


8 ffix’ 


i.e. for any A, the greatest value of df is obtained if the perturbations Sxj 
are chosen to be proportional to 8ffix p and that, further, if Sf< 0, i.e. 
the search is to converge to a minimum, the constant of proportionality 
must be negative. This direction is called the direction of steepest descent , 
and it follows that the objective function can always be reduced by following 
the direction of steepest descent, although this may only be true for a short 
distance. 

One remark that is worth making about gradient methods concerns the 
actual calculation of the derivatives. Although gradient methods are, in 
general, more efficient than direct search methods, their efficiency can drop 
considerably if the derivatives are not obtained analytically, and so if numeri¬ 
cal methods are used to calculate these quantities great care should be 
exercised to ensure that inaccuracies do not result. 

Up to now we have not specified the form of the objective function, 
except that it is nonlinear in its variables. However, in many practical prob¬ 
lems involving unconstrained functions it is found that the objective function 
can be well-approximated by a quadratic form in the neighbourhood of the 
minimum. There is therefore considerable interest in methods that guarantee 
to find the minimum of a quadratic in a specified number of steps. Such 
methods are said to be quadratically convergent , and the hope is that prob¬ 
lems which are nor strictly quadratic may still be tractable by such methods, 
a hope which is borne out rather well in practice. 

The most useful of the methods having the property of quadratic con¬ 
vergence are those making use of the so-called conjugate directions defined 
as follows. 
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Definition C. 1. Two direction vectors d t and d 2 are said to be conjugate 
with respect to the positive definite matrix G if 

d t r Gd 2 = 0. 

The importance of conjugate directions in optimization problems stems from 
the following theorem. 

Theorem C.l. If d, (/ = 1, 2, are a set of vectors mutually conjugate 

with respect to the positive definite matrix G, then the minimum of the quad¬ 
ratic form 

f(x) = ix T Gx + b r x + a, (C4) 

can be found from an arbitrary point x (0) by a finite descent calculation in which 
each of the vectors d f is used as a descent direction only once, their order of 
use being arbitrary. 

Proof Since the set of d, are linearly independent any arbitrary vector v 
can be written in the form 


v 


X M> 


i= 1 


where, because of the conjugacy of the vectors, d,. 


a i = 


d; r Gv 
di r Gd ; ' 


(C5) 


(C6) 


If the general iterative equation (Cl) is applied repeatedly, then at the 
nth stage we have 

x<"> = x <0) + t Mi> (C7) 

i= 1 

and in the (n + l)th stage, from x (n) -> x (n+1) , the distance h n+1 along the 
direction d„ +1 is found from the equation 

d„ + i T V/[x<" +1 >] = 0. 

Using (C4) this gives 

d„,'(G(*<°> + | i M, + W,,.)+b} -0, 
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"n + l 


d B +i T Gd„ +1 


(C 8 ) 


which depends only on x (0) , and not on the path by which x (w+1 > is reached 
from x (0) . Using (C 8 ) in (C7) for / 7 -steps gives 


X 00 = X «»- f d ^Gx< Q ) + b3d, 

d? Gdi 

and using (C5) and (C 6 ) this becomes 

x (p) _ x (0) _ x (o> _ G _1 b = - G~% 


(C9) 


(CIO) 


which shows that the minimum has been reached in p iterations. 

Although methods having the property of quadratic convergence will 
guarantee to converge to the exact minimum of a quadratic in p steps, where 
p is the dimensionality of the problem, when applied to functions which are 
not strictly quadratic the problem arises of determining when convergence 
has taken place. This is, in general, a difficult problem, but in practice a 
suitable criterion is to consider that convergence has been achieved if, for 
given small values of e and s' 

/(x<">) -/(x ( " + 1) ) < e , 

and/or 


|x (M) - x (w+1) | < s', 

for a sequence of q successive interations, where q is a number which will 
vary with the type of function being minimized, but at a very generous over 
estimate q ~ /?, the number of variables. In practice a considerably smaller 
number of values is usually sufficient. 

Finally, it should be mentioned that all present techniques for optimizing 
nonlinear functions locate only local optima , i.e. points x m at which/(x m ) < 
/(x) for all x in a region in the neighbourhood of x m . For multivariate prob¬ 
lems there could well be better local optima located at some distance from 
x m . At present there are no general methods for locating the global optimum 
(i.e. the absolute optimum) of a function, and so it is essential to restart the 
search procedure from many different initial points x (0) to ensure that the 
full p-dimensional space has been explored. 
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2 Unconstrained minimization 

We will start the discussion of specific techniques by considering the 
unconstrained problem, for which powerful general techniques have been 
devised. 


C.2.1. Univariate Problems 

The problem of minimizing a function of one variable is very important 
in practice, because, as we shall see later, many methods for optimizing 
multivariate functions proceed by a series of searches along a line in the 
parameter space, and each of these searches is equivalent to a univariate 
search. 

Univariate searches fall into two groups (a) those which specify an interval 
within which the minimum lies, and (b) those which specify the minimum 
by a point approximating it. The latter methods are the most useful in prac¬ 
tice and we shall only consider them here. The basic procedure is as follows. 
Proceeding from an initial point x (0) a systematic search technique is applied 
to find a region containing the minimum. This bracket is then refined by 
fitting a quadratic interpolation polynomial to the three points making up 
the bracket, and locating the minimum of this polynomial. As a result of 
this evaluation a new bracket is formed, and the procedure is repeated. 
This method is both simple and very safe in practice. 

As an example of the above technique we will give the method of Davies, 
Swann and Campey. 

(a) Method of Davies , Swann and Campey 

To bracket the minimum the function is first evaluated at /(* (0) ) and 
/(x (0) + /z). If f(x {0) + h) </(x (0> ), then f(x (0) + 2h) is evaluated. This 
doubling of the step-length h is repeated until a value of f(x) is found such 
that /(x (0) + 2 n h) > f(x (0) + 2 n ~ 1 h). At this point the step-length is halved 
and a step again taken from the last successive point, i.e. the (n — l)th. 
This procedure produces four points equally spaced along the axis of search, 
at each of which the function has been evaluated. The end point furthest 
from the point corresponding to the smallest function value is rejected, and 
the remaining three points used for quadratic interpolation. Had the first 
step failed then the search is continued by reversing the sign of the step 
length. If the first step in this direction also fails then the minimum has 
been bracketed and the interpolation may be made. If the three points used 
for the interpolation are jc 15 x 2 , x 3 with x t < x 2 < x 3 and x 3 — x 2 = x 2 — 
x x = /, then the minimum of the fitted quadratic is at 

* 2 2[f( Xl ) - 2 /(x 2 ) + /(* 3 )] ’ 
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An iteration is completed by evaluating Convergence tests are now 

applied and, if required, a further iteration is performed, with a reduced 
step length, using as the initial point whichever of x 2 or x m corresponds to 
the smaller function value. 


3 Multivariate problems 


C.3.1. Direct Search Methods 

As we mentioned in Section C.2.1. univariate searches are important as 
many methods for locating optima of multivariate functions are based on 
a series of linear searches along a line in the parameter space. By a linear 
method we will therefore mean any technique which uses a set of direction 
vectors in the search, and which proceeds by explorations along these direc¬ 
tions, deciding future strategy by the results obtained in previous searches. 

The simplest of all possible methods would be to keep (p — 1) of the 
parameters fixed and find a minimum with respect to the /?th parameter, 
doing this in turn for each variable. The progress of such an alternating- 
variable search is shown in Fig. C.l for the case of two variables. In general 
the contours of equal function value will be aligned along the so-called 
principal axes , which are not parallel to the coordinate axes. In this case 
only very small steps will be taken at each stage and the technique is very 
inefficient, being worse the larger the number of variables. It would clearly 
be very much more efficient to reorientate the direction vectors along the 
principal axes, and this is done in several techniques, one of the most suc¬ 
cessful of which is due to Rosenbrock. 



Fig. C.l. Typical progress in an alternating variable search. 
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(a) Rosenbrock's Method 

The method of Rosenbrock utilizes p orthonormal direction vectors, 
which are initially taken to be the coordinate directions, and reorientates 
them during the search procedure such that one of them lies in the direction 
of total recent progress of the search. We will denote the p direction vectors 
at the nth stage by d[ n) (i = 1 , 2 and by h t the step length associated 
with each of these directions. A single stage of the method is then as follows. 

A step ^ is taken in the direction d/"* from a given starting point x (n_1) . 
If /(x (n) ) </(x ( " -1) ) the step is considered successful and h t is multiplied 
by a fixed multiplier a > 1 , the new point being kept and the success recorded. 
If/(x w ) > /(x ( "- 1) ) the step is retracted, h- t is multiplied by )?, where - 1 < 
P < 0, and a failure is recorded. This procedure is repeated for all variables 
in turn from 1 to p and then starting again at 1 , cycling until a success fol¬ 
lowed by a failure is recorded along every one of the p directions. A new 
set of orthonormal vectors is now constructed as follows. 

If Si is the algebraic sum of all the successful steps in the direction d/" 
during the nth stage, we can define vectors a f (i = 1 , 2 , ...,p) by 

a f = E ^d* (n) , 

k = i 

so that a x represents the total progress made during the complete stage, 
a 2 the total progress made excluding that made in the direction d 1 ( " ) etc. 
Since the set a f are linearly independent they can be used to generate a new 
set of orthonormal directions by the Gram-Schmidt procedure which gives 

d*' = a* — (a t T d/ B+ 1 ) )d/" +1) , 

1=1 


with 

d* (B+1) = d*7|d*'|, 

and £ = 1,2, p. Thus a new stage can be started with the new set of 
direction vectors d/" +1) , and the procedure is repeated until some suitable 
convergence criteria are satisfied. 

(b) Powell's Method 

A more powerful method which is based on the use of conjugate direc¬ 
tions rather than the orthonormal vectors of Rosenbrock’s method is due 
to Powell. It uses the fact that, for a positive definite quadratic form, if 
searches for minima are made along p conjugate directions then the join of 
these minima is conjugate to all of those directions, a result that follows 
from Theorem C.l. 
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In the usual notation, the procedure is to start from x (0) and locate the 
minimum x (1) in the direction d/* 0 . Then from x (1) locate the minimum 
x (2) in the direction d 2 (n) etc. until the minimum x (p) in the direction d p (n) is 
found. The direction of total progress made during this cycle is then 

d = x (p) - x (0) . 

Provided that certain conditions hold the minimum in the direction d is 
then found and used as a starting point in the next iteration, the list of direc¬ 
tion vectors being updated as follows 

(di (B+1) , d 2 <" + 1) ,...,d/” +1 >) 

= (di (#0 , d 2 « dd y+1 <">, ..., d«, d) 

where d/ B) is that direction vector along which the greatest reduction in 
the function value occured during the nth stage. Care must be taken in this 
updating procedure to ensure that the new direction vectors are always 
linearly independent. 

Powell shows that for the quadratic function of Eqn (C4), if d i (w) is scaled 
so that 


d/ n)T Gd/ n) = 1, i = 1, 2, 

then the determinant D of the matrix whose columns are df n) has a maximum 
if, and only if, the vectors d/ w) are mutually conjugate with respect to G. 
Thus the direct d only replaces an existing search direction if by so doing 
D is increased. 


C.3.2. Gradient Methods 

The earliest method using gradients is that of steepest descent mentioned 
previously. In this method the normalized gradient vector at the current 
point is found, and using a step-length ^ a new point is generated via the 
general iterative equation. This procedure is continued until a function value 
is found which has not decreased. The step-length is then reduced and the 
search restarted from the best previous point. If the actual minimum along 
each search direction is located then the performance of this method is 
similar in appearance to the alternating variable search, and, in particular, 
is rather erratic, the search directions oscillating about the principal axes. 
A method that in principle is far better is based on an examination of the 
second derivatives of the function. 
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(a) Newton's Method 

A second-order Taylor expansion of the objective function f{x ) about 
the minimum x* is 


/(x) =/(x*) + 




+ i E Z hjh] 

j= 1 k = 1 


/ d 2 f y 

\dxjdxj 9 


where * means that these quantities are evaluated at x = x*. Differentiating 
this equation gives 


9i = 


df 

dxi 



/ = 1,2, 


(Cll) 


The minimum is therefore obtained in one step by the move x* = x — h, 
where the components of h are found by solving the p linear equations (Cl 1). 
If we define 


r. - d2 f 

jk dXj dx k 9 

then we have 

x* = x — G*” 1 g. 

Since G * _1 will not, of course, be known it is usual to replace it by G _1 
evaluated at the current point x (0 and use the iterative equation 

X (»+D = x <») _ G„-I g „. (C12) 

The method is clearly quadratically convergent but suffers from severe 
difficulties. 

Firstly, there is the numerical problem of calculating the inverse matrix 
of second derivatives, and secondly, and more seriously, for a general func¬ 
tion G -1 is not guaranteed to be positive definite, and in this case the me¬ 
thod will diverge. Thus, while Newton’s method is efficient in the 
immediate neighbourhood of a minimum, away from this point it has little 
to recommend it, the method of steepest descent being far preferable. 

In view of the above remarks an efficient method would be one that starts 
by using the method of steepest descent and, at a later stage, uses Newton’s 
method. A method which does this automatically is due to Davidon, and 
probably represents the most powerful method currently available for 
optimizing unconstrained functions. 
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(b) Davidon's Method 

Davidon’s method is an iterative scheme based on successive approxima¬ 
tions to the matrix G* -1 . The best approximation to this matrix, say H„, 
is used to define a new search direction by a modification of Eqn (C12), i.e. 

x (w+1) = x<"> - h n H n g„, 

where g n is the vector of first derivatives of /(x (w) ) with respect to x (w) . The 
step length h n is that necessary to find the minimum in the search direction 
d„ = — H„g„, and may be found by any univariate search procedure. If the 
sequence {H n } is positive definite, it can be shown that the convergence of 
this method is guaranteed. Furthermore, if the search directions d„ are 
mutually conjugate, then the method is quadratically convergent. 

Davidon has shown that both of these conditions can be met if, at each 
stage of the iteration, the matrix H n is updated according to the equation 

H„+i = H n + A w + B„, 

where 

— h n [H n g n g n T H n T ] 

(H„g„) r V ’ 


and 


- H,W r H, 
V T H„V 9 


V = g„+i - g M . 


It is usual to start the iteration from H 0 = 1. The matrix A„ ensures that 
the sequence {H n } converges to G*“ x , and B„ ensures that each H n is positive 
definite. The derivation of these expressions may be found in the book by 
Kowalik and Osborne, cited in the Bibliography. 


4 Constrained minimization 

Constrained optimization, not surprisingly, is a more difficult problem 
than unconstrained optimization, and only a very brief discussion will be 
given here. 

Firstly, an obvious remark. If the constraints can be removed by suitable 
transformations then this, of course, should be done. For example many 
problems involve the simple parameter constraints 

/ ^ x ^ w, 

which can be removed entirely by the transformation 

x = / + (w - /) sin 2 y, 
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thereby enabling an unconstrained minimization to be performed with respect 
to y . Such transformations cannot produce additional local optima. 

If the constraints cannot be removed then one of the simplest ways of 
incorporating them is to arrange that the production of non-feasible points 
is unattractive. This is the basis of the method of penalty functions. 


C.4.1. Penalty Functions 

If the problem is to minimize /(x) subject to the m inequality constraints 
Cj(x) >0, / = 1, 2, ..., rn 9 

and the s equality constraints 

e/x) = 0, j = 1,2, 
we could consider the function 


F(x) =/(x) + X AjCj 2 (x)H[c,(x)] + X. V*A X )> 


i — 1 


j= 1 


(Cl 3) 


where H(q) is the (0, 1) step function 



q>0 

q < 0, 


and X i9 \{ are positive scaling factors, chosen so that the contributions of 
the various terms to (C13) are approximately equal. The penalty is thus the 
weighted sum of squares of the amount by which the constraints are violated. 

This method works reasonably well in practice, but has the disadvantage 
of requiring that values of /(x) be calculated at non-feasible points, and this 
may not always be possible. A method which restricts the search to feasible 
points only is due to Carroll, and is known as Carroll's created response 
surface technique . In this method if the constraints are inequalities, the 
surface 

F(x, k) =/(x) + k X —, 

f=i c<(x) 


is considered, where k > 0, and a minimum found as a function of x. This 
minimum point is then used as the starting value for a new minimization 
for a reduced value of k , and the procedure repeated until k — 0 is reached. 
In all minimizations non-feasible points are excluded. The theoretical 
development of this method, and its extension to incorporate equality con¬ 
straints may be found in the book of Kowalik and Osborne. 
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Table D1 

Ordinates of the standard normal density function 

/(x) = (i exp(_x2/2) - 

Note that 

/(-*) = f(x). 


X -00 01 02 03 04 -05 06 07 08 09 


■0 -3989 -3989 -3989 -3988 -3986 -3984 -3982 -3980 -3977 -3973 

•1 -3970 -3965 -3961 -3956 -3951 -3945 -3939 -3932 -3925 -3918 

■2 -3910 -3902 -3894 -3885 -3876 -3867 -3857 -3847 -3836 -3825 

•3 -3814 -3802 -3790 -3778 -3765 -3752 -3739 -3725 -3712 -3697 

■4 -3683 -3668 -3653 -3637 -3621 -3605 -3589 -3572 -3555 -3538 

•5 -3521 -3503 -3485 -3467 -3448 -3429 -3410 -3391 -3372 -3352 

•6 -3332 -3312 -3292 -3271 -3251 -3230 -3209 -3187 -3166 -3144 

■7 -3123 -3101 -3079 -3056 -3034 -3011 -2989 -2966 -2943 -2920 

•8 -2897 -2874 -2850 -2827 -2803 -2780 -2756 -2732 -2709 -2685 

•9 -2661 -2637 -2613 -2589 -2565 -2541 -2516 -2492 -2468 -2444 

1-0 -2420 -2396 -2371 -2347 -2323 -2299 -2275 -2251 .2227 -2205 

M -2179 -2155 -2131 -2107 -2083 -2059 -2036 -2012 -1989 1963 

1-2 -1942 -1919 1895 -1872 -1849 -1826 -1804 1781 -1758 -1735 

1-3 1714 -1691 1669 -1647 1626 -1604 -1582 -1561 1539 -1516 

1-4 -1497 1476 1456 -1435 -1415 -1394 -1374 -1354 -1334 1318 

1-5 -1295 -1276 1257 -1238 -1219 1200 -1182 -1163 1145 -1127 

1-6 -1109 -1092 1074 1057 1040 1023 -1006 -0989 0973 -0957 

1-7 0940 0925 -0909 0893 -0878 -0863 -0848 -0833 0818 0804 

1-8 -0790 0775 -0761 0748 -0734 0721 -0707 0694 -0681 -0669 

1- 9 0656 -0644 0632 0620 0608 -0596 -0584 0573 -0562 -0551 

2- 0 -0540 0529 0519 -0508 -0498 0488 0478 -0468 0459 0449 

2-1 0440 -0431 0422 -0413 0404 0396 -0387 0379 -0371 -0363 
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Table D1 ( continued ) 


•00 01 02 -03 04 05 06 07 08 09 


2-2 0355 -0347 0339 -0332 -0325 -0317 -0310 0303 -0297 -0290 

2-3 -0283 -0277 0270 -0264 -0258 -0252 0246 0241 0235 -0229 

2-4 ■ 0224 -0219 -0213 -0208 0203 0198 -0194 -0189 0184 0180 

2-5 -0175 -0171 0167 -0163 -0158 -0154 0151 0147 0143 0139 

2-6 -0136 -0132 0129 -0126 0122 -0119 0116 0113 0110 0107 

2-7 0104 0101 0099 0096 0093 0091 0088 0086 0084 0081 

2-8 0079 0077 0075 0073 0071 0069 0067 0065 -0063 0061 

2- 9 0060 0058 0056 0055 0053 0051 0050 -0048 0047 0046 

30 0044 0043 0042 0040 0039 0038 0037 0036 0035 -0034 

3 1 0033 -0032 0031 0030 0029 0028 0027 -0026 0025 0025 

3- 2 0024 -0023 0022 0022 0021 -0020 0020 0019 0018 0018 

3-3 0017 0017 0016 -0016 0015 0015 0014 0014 0013 0013 

3-4 -0012 0012 0012 -0011 0011 0010 0010 0010 0009 0009 

3-5 0009 0008 0008 0008 0008 0007 0007 0007 0007 0006 

3-6 0006 0006 0006 0005 0005 0005 0005 0005 0005 0004 

3-7 0004 0004 0004 0004 0004 0004 -0003 0003 0003 0003 

3-8 0003 0003 0003 0003 0003 0002 0002 -0002 0002 0002 

3-9 0002 0002 0002 0002 0002 -0002 -0002 0002 0001 0001 
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Table D2 

Standard normal distribution function 

Note that 

F ( - x ) = 1 - F ( x ). 


x 00 01 02 03 04 05 06 07 08 09 


•0 -5000 -5040 -5080 -5120 -5160 -5199 -5239 -5279 -5319 -5359 

•1 -5398 -5438 -5478 -5517 -5557 -5596 -5636 -5675 -5714 -5753 

•2 -5793 -5832 -5871 -5910 -5948 -5987 -6026 -6064 -6103 -6141 

■3 -6179 -6217 -6255 -6293 -6331 -6368 -6406 -6443 -6480 -6517 

•4 -6554 -6591 -6628 -6664 -6700 -6736 -6772 -6808 -6844 -6879 

•5 -6915 -6950 -6985 -7019 -7054 -7088 -7123 -7157 -7190 -7224 

•6 -7257 -7291 -7324 -7357 -7389 -7422 -7454 -7486 -7517 -7349 

•7 -7580 -7611 -7642 -7673 -7704 -7734 -7764 -7794 -7823 -7852 

•8 -7881 -7910 -7939 -7967 -7995 -8023 -8051 -8078 -8106 -8133 

•9 -8159 -8186 -8212 -8238 -8264 -8289 -8315 -8340 -8365 -8389 

1 0 -8413 -8438 -8461 -8485 -8508 -8531 -8554 -8577 -8599 -8621 

M -8643 -8665 -8686 -8708 -8729 -8749 -8770 -8790 -8810 -8830 

1-2 -8849 -8869 -8888 -8907 -8925 -8944 -8962 -8980 -8997 -9015 

1-3 -9032 -9049 -9066 -9082 -9099 -9115 -9131 -9147 -9162 -9177 

1-4 -9192 -9207 -9222 -9236 -9251 -9265 -9279 -9292 -9306 -9319 

1-5 -9332 -9345 -9357 -9370 -9382 -9394 -9406 -9418 -9429 -9441 

1-6 -9452 -9463 -9474 -9484 -9495 -9505 -9515 -9525 -9535 -9545 

1-7 -9554 -9564 -9573 -9582 -9591 -9599 -9608 -9616 -9625 '9633 

1-8 -9641 -9649 -9656 -9664 -9671 -9678 -9686 -9693 -9699 -9706 

1 - 9 -9713 -9719 -9726 -9732 -9738 -9744 -9750 -9756 -9761 -9767 

20 -9772 -9778 -9783 -9788 -9793 -9798 -9803 -9808 -9812 -9817 

2 1 -9821 -9826 -9830 -9834 -9838 -9842 -9846 -9850 -9854 -9857 

2 - 2 -9861 -9864 -9868 -9871 -9875 -9878 -9881 -9884 -9887 -9890 

2-3 -9893 -9896 -9898 -9901 -9904 -9906 -9909 -9911 -9913 -9916 

2-4 -9918 -9920 -9922 -9925 -9927 -9929 -9931 -9932 -9934 -9936 

2-5 -9938 -9940 -9941 -9943 -9945 -9946 -9948 -9949 -9951 -9952 

2-6 -9953 -9955 -9956 -9957 -9959 -9960 -9961 -9962 -9963 -9964 

2-7 -9965 -9966 -9967 -9968 -9969 -9970 -9971 -9972 -9973 -9974 

2-8 -9974 -9975 -9976 -9977 -9977 -9978 -9979 -9979 -9980 -9981 

2 - 9 -9981 -9982 -9982 -9983 -9984 -9984 -9985 -9985 -9986 -9986 

3 0 -9987 -9987 -9987 -9988 -9988 -9989 -9989 -9989 -9990 -9990 

31 -9990 -9991 -9991 -9991 -9992 -9992 -9992 -9992 -9993 -9993 

3 - 2 -9993 -9993 -9994 -9994 -9994 -9994 -9994 -9995 -9995 -9995 

3-3 •9995 -9995 -9995 -9996 -9996 -9996 -9996 -9996 -9996 -9997 

3.4 .9997 .9997 .9997 .9997 .9997 .9997 . 9997 . 9997 .999-7 . 9998 


X 

1-282 

1-645 

1-960 

2-326 

2-576 

3090 

3-291 3-891 4-417 

F ( x ) 

•90 

•95 

•975 

•99 

•995 

•999 

•9995 -99995 -999995 

2[1 - F { x )] 

•20 

•10 

•05 

•02 

•01 

•002 

•001 -0001 00001 
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The probability of exactly r successes in n independent Bernoulli trials, with the 
probability of a success in a single trial equal to p , is given by the (r + l)th term in 
the binomial expansion of (q 4- p) n , i.e. 

f(x) = I" r = 0 , 1 , ft (q+p) = 1. 

The probability of obtaining 5 or more successes is given by 

This table gives values of Ffor specified values of «, s and p. If p > 0-5 the values 
for F are obtained from 


1 - Z 

r~n—s +1 


d r P n - r . 


p 


n 

s 

•05 

•10 

•15 

•20 

•25 

•30 

•35 

•40 

•45 

•50 

2 

1 

•0975 

•1900 

•2775 

•3600 

•4375 

•5100 

•5775 

•6400 

•6975 

•7500 


2 

•0025 

•0100 

•0225 

•0400 

•0625 

•0900 

•1225 

•1600 

•2025 

•2500 

3 

1 

•1426 

•2710 

•3859 

•4880 

•5781 

•6570 

•7254 

•7840 

•8336 

•8750 


2 

•0072 

•0280 

•0608 

•1040 

•1562 

•2160 

•2818 

•3520 

•4252 

•5000 


3 

•0001 

•0010 

•0034 

•0080 

•0156 

•0270 

•0429 

•0640 

•0911 

•1250 

4 

1 

•1855 

•3439 

•4780 

•5904 

•6836 

•7599 

•8215 

•8704 

•9085 

•9375 


2 

•0140 

♦0523 

•1095 

•1808 

•2617 

•3483 

•4370 

•5248 

•6090 

•6875 


3 

•0005 

•0037 

•0120 

•0272 

•0508 

•0837 

•1265 

•1792 

•2415 

•3125 


4 

•0000 

•0001 

•0005 

•0016 

•0039 

•0081 

•0150 

•0256 

•0410 

•0625 

5 

1 

•2262 

•4095 

•5563 

•6723 

•7627 

•8319 

•8840 

•9222 

•9497 

•9688 


2 

•0226 

•0815 

•1648 

•2627 

♦3672 

•4718 

•5716 

•6630 

•7438 

•8125 


3 

•0012 

•0086 

•0266 

•0579 

•1035 

■1631 

•2352 

•3174 

•4069 

•5000 


4 

•0000 

•0005 

•0022 

•0067 

•0156 

•0308 

•0540 

•0870 

•1312 

•1875 


5 

•0000 

•0000 

•0001 

•0003 

•0010 

•0024 

•0053 

•0102 

•0185 

•0312 

6 

1 

•2649 

•4686 

•6229 

•7379 

•8220 

•8824 

•9246 

•9533 

•9723 

•9844 


2 

•0328 

•1143 

•2235 

•3447 

♦4661 

•5798 

•6809 

•7667 

•8364 

•8906 


3 

•0022 

•0158 

•0473 

•0989 

•1694 

•2557 

•3529 

■4557 

•5585 

•6562 


4 

•0001 

•0013 

•0059 

•0170 

•0376 

•0705 

•1174 

•1792 

■2553 

•3438 


5 

*0000 

•0001 

•0004 

•0016 

•0046 

•0109 

•0223 

•0410 

•0692 

•1094 


6 

•0000 

•0000 

•0000 

•0001 

•0002 

•0007 

•0018 

•0041 

•0083 

•0156 

7 

1 

•3017 

•5217 

•6794 

•7903 

•8665 

•9176 

•9510 

•9720 

•9848 

•9922 


2 

•0444 

•1497 

•2834 

•4233 

•5551 

•6706 

•7662 

•8414 

•8976 

•9375 
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Table D3 ( continued ) 


P 


n 

s 

•05 

•10 

•15 

•20 

•25 

•30 

•35 

•40 

•45 

•50 


3 

•0038 

•0257 

•0738 

•1480 

•2436 

•3529 

•4677 

•5801 

•6836 

•7734 


4 

•0002 

•0027 

•0121 

•0333 

•0706 

•1260 

•1998 

•2898 

•3917 

•5000 


5 

•0000 

•0002 

•0012 

•0047 

•0129 

•0288 

•0556 

•0963 

•1529 

♦2266 


6 

•0000 

•0000 

•0001 

0004 

•0013 

•0038 

•0090 

•0188 

•0357 

•0625 


7 

•0000 

•0000 

•0000 

•0000 

•0001 

•0002 

•0006 

0016 

•0037 

0078 

8 

1 

•3366 

•5695 

•7275 

•8322 

•8999 

•9424 

•9681 

•9832 

•9916 

•9961 


2 

♦0572 

•1869 

•3428 

•4967 

•6329 

•7447 

•8309 

•8936 

•9368 

•9648 


3 

•0058 

•0381 

•1052 

•2031 

•3215 

•4482 

•5722 

•6846 

•7799 

•8555 


4 

0004 

•0050 

•0214 

•0563 

•1138 

•1941 

•2936 

•4059 

•5230 

•6367 


5 

•0000 

•0004 

•0029 

•0104 

•0273 

•0580 

•1061 

•1737 

•2604 

♦3633 


6 

•0000 

•0000 

•0002 

•0012 

•0042 

•0113 

•0253 

•0498 

•0885 

•1445 


7 

•0000 

•0000 

•0000 

•0001 

•0004 

0013 

•0036 

•0085 

•0181 

•0352 


8 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0002 

•0007 

•0017 

•0039 

9 

1 

•3698 

•6126 

•7684 

•8658 

•9249 

•9596 

•9793 

•9899 

•9954 

•9980 


2 

•0712 

•2252 

■4005 

•5638 

•6997 

•8040 

•8789 

•9295 

•9615 

•9805 


3 

•0084 

•0530 

•1409 

•2618 

■3993 

•5372 

•6627 

•7682 

•8505 

•9102 


4 

0006 

•0083 

•0339 

•0856 

•1657 

•2703 

•3911 

•5174 

•6386 

•7461 


5 

•0000 

•0009 

•0056 

•0196 

•0489 

•0988 

•1717 

•2666 

•3786 

•5000 


6 

•0000 

•0001 

•0006 

•0031 

•0100 

•0253 

•0536 

•0994 

•1658 

•2539 


7 

•0000 

•0000 

0000 

0003 

•0013 

0043 

•0112 

•0250 

•0498 

•0898 


8 

•0000 

•0000 

•0000 

•0000 

•0001 

•0004 

•0014 

0038 

•0091 

•0195 


9 

0000 

•0000 

•0000 

0000 

0000 

0000 

•0001 

•0003 

•0008 

0020 

10 

1 

•4013 

•6513 

•8031 

•8926 

•9437 

•9718 

•9865 

•9940 

•9975 

•9990 


2 

•0861 

•2639 

•4557 

•6242 

•7560 

•8507 

•9140 

•9536 

•9767 

•9893 


3 

•0115 

•0702 

•1798 

•3222 

•4744 

•6172 

•7384 

•8327 

•9004 

•9453 


4 

•0010 

•0128 

•0500 

•1209 

•2241 

•3504 

•4862 

•6177 

•7340 

•8281 


5 

•0001 

•0016 

•0099 

•0328 

•0781 

•1503 

•2485 

•3669 

•4956 

•6230 


6 

0000 

•0001 

•0014 

•0064 

•0197 

•0473 

•0949 

•1662 

•2616 

•3770 


7 

0000 

•0000 

•0001 

•0009 

•0035 

•0106 

•0260 

•0548 

•1020 

•1719 


8 

0000 

•0000 

•0000 

•0001 

•0004 

•0016 

•0048 

•0123 

•0274 

•0547 


9 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0005 

•0017 

•0045 

•0107 


10 

0000 

•0000 

•0000 

•0000 

0000 

•0000 

•0000 

•0001 

•0003 

•0010 

11 

1 

•4312 

•6862 

•8327 

•9141 

•9578 

•9802 

•9912 

•9964 

•9986 

•9995 


2 

•1019 

•3026 

•5078 

•6779 

•8029 

•8870 

■9394 

•9698 

•9861 

•9941 


3 

•0152 

•0896 

•2212 

•3826 

•5448 

•6873 

•7999 

•8811 

•9348 

•9673 


4 

•0016 

•0185 

•0694 

•1611 

•2867 

•4304 

•5744 

•7037 

•8089 

•8867 


5 

•0001 

•0028 

•0159 

•0504 

•1146 

•2103 

•3317 

•4672 

•6029 

•7256 


6 

•0000 

•0003 

•0027 

•0117 

•0343 

•0782 

•1487 

•2465 

•3669 

•5000 


7 

0000 

•0000 

0003 

•0020 

•0076 

•0216 

•0501 

•0994 

•1738 

•2744 


8 

•0000 

•0000 

•0000 

•0002 

•0012 

•0043 

•0122 

•0293 

•0610 

•1133 


9 

•0000 

•0000 

•0000 

•0000 

•0001 

•0006 

•0020 

•0059 

•0148 

•0327 
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P 


n s 

•05 

•10 

•15 

•20 

•25 

•30 

•35 

•40 

•45 

•50 

10 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0002 

•0007 

•0022 

•0059 

11 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0002 

•0005 

12 1 

•4596 

•7176 

•8578 

•9313 

•9683 

•9862 

•9943 

•9978 

•9992 

•9998 

2 

•1184 

•3410 

•5565 

•7251 

•8416 

•9150 

•9576 

•9804 

•9917 

•9968 

3 

•0196 

•1109 

•2642 

•4417 

•6093 

•7472 

•8487 

•9166 

•9579 

•9807 

4 

•0022 

•0256 

•0922 

•2054 

•3512 

•5075 

•6533 

•7747 

•8655 

•9270 

5 

•0002 

•0043 

•0239 

•0726 

•1576 

•2763 

•4167 

•5618 

•6956 

•8062 

6 

•0000 

•0005 

•0046 

•0194 

•0544 

•1178 

•2127 

•3348 

•4731 

•6128 

7 

•0000 

•0001 

•0007 

•0039 

•0143 

•0386 

•0846 

•1582 

•2607 

•3872 

8 

•0000 

•0000 

•0001 

•0006 

•0028 

•0095 

•0255 

•0573 

•1117 

•1938 

9 

•0000 

•0000 

•0000 

•0001 

•0004 

•0017 

•0056 

•0153 

•0356 

•0730 

10 

0000 

•0000 

•0000 

•0000 

•0000 

•0002 

•0008 

•0028 

•0079 

•0193 

11 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0003 

•0011 

•0032 

12 

0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0002 

13 1 

•4867 

•7458 

•8791 

•9450 

•9762 

•9903 

•9963 

•9987 

•9996 

•9999 

2 

•1354 

•3787 

•6017 

•7664 

•8733 

•9363 

•9704 

•9874 

•9951 

•9983 

3 

•0245 

•1339 

•2704 

•4983 

•6674 

•7975 

■8868 

•9421 

•9731 

•9888 

4 

•0031 

•0342 

•0967 

•2527 

•4157 

•5794 

•7217 

•8314 

•9071 

•9539 

5 

•0003 

•0065 

•0260 

•0991 

•2060 

•3457 

•4995 

•6470 

•7721 

•8666 

6 

•0000 

•0009 

•0053 

•0300 

•0802 

•1654 

•2841 

•4256 

•5732 

•7095 

7 

0000 

•0001 

0013 

•0070 

•0243 

•0624 

•1295 

•2288 

•3563 

•5000 

8 

•0000 

•0000 

•0002 

•0012 

•0056 

•0182 

•0462 

•0977 

•1788 

•2905 

9 

0000 

•0000 

•0000 

•0002 

•0010 

•0040 

•0126 

•0321 

•0698 

•1334 

10 

•0000 

•0000 

•0000 

•0000 

•0001 

•0007 

•0025 

•0078 

•0203 

•0461 

11 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0003 

•0013 

■0041 

•0112 

12 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0005 

•0017 

13 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

14 1 

•5123 

•7712 

•8972 

•9560 

•9822 

•9932 

•9976 

•9992 

•9998 

•9999 

2 

•1530 

•4154 

•6433 

•8021 

•8990 

•9525 

•9795 

•9919 

•9971 

•9991 

3 

•0301 

•1584 

•3521 

•5519 

•7189 

•8392 

•9161 

•9602 

•9830 

•9935 

4 

0042 

•0441 

•1465 

•3018 

•4787 

•6448 

•7795 

•8757 

•9368 

•9713 

5 

•0004 

•0092 

•0467 

•1298 

♦2585 

•4158 

•5773 

•7207 

•8328 

•9102 

6 

0000 

0015 

•0115 

•0439 

•1117 

•2195 

♦3595 

•5141 

•6627 

•7880 

7 

•0000 

•0002 

•0022 

•0116 

•0383 

•0933 

•1836 

•3075 

■4539 

•6047 

8 

•0000 

•0000 

•0003 

•0024 

•0103 

•0315 

•0753 

•1501 

•2586 

•3953 

9 

•0000 

•0000 

0000 

•0004 

•0022 

•0083 

•0243 

•0583 

•1189 

•2120 

10 

•0000 

•0000 

•0000 

•0000 

•0003 

•0017 

•0060 

•0175 

•0426 

•0898 

11 

•0000 

•0000 

•0000 

•0000 

•0000 

•0002 

•0011 

•0039 

•0114 

•0287 

12 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0006 

•0022 

•0065 

13 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0003 

•0009 

14 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 
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n s -05 -10 15 -20 -25 -30 -35 -40 -45 -50 


15 1 

•5367 

*7941 

•9126 

•9648 

•9866 

•9953 

•9984 

•9995 

•9999 

1 0000 

2 

♦1710 

*4510 

•6814 

•8329 

•9198 

•9647 

•9858 

•9948 

•9983 

•9995 

3 

•0362 

•1841 

•3958 

•6020 

•7639 

•8732 

•9383 

•9729 

•9893 

•9963 

4 

•0055 

•0556 

•1773 

•3518 

*5387 

•7031 

•8273 

•9095 

•9576 

•9824 

5 

•0006 

•0127 

•0617 

•1642 

•3135 

•4845 

•6481 

•7827 

•8796 

•9408 

6 

0001 

•0022 

•0168 

•0611 

•1484 

•2784 

•4357 

•5968 

•7392 

•8491 

7 

•0000 

•0003 

*0036 

•0181 

•0566 

•1311 

•2452 

•3902 

•5478 

•6964 

8 

•0000 

•0000 

•0006 

•0042 

•0173 

*0500 

•1132 

•2131 

•3465 

•5000 

9 

•0000 

•0000 

•0001 

•0008 

•0042 

•0152 

•0422 

■0950 

•1818 

•3036 

10 

•0000 

•0000 

0000 

•0001 

•0008 

•0037 

•0124 

•0338 

•0769 

•1509 

11 

•0000 

•0000 

•0000 

•0000 

•0001 

0007 

•0028 

0093 

•0255 

•0592 

12 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0005 

0019 

•0063 

•0176 

13 

•0000 

•0000 

0000 

•0000 

•0000 

■0000 

•0001 

•0003 

•0011 

•0037 

14 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0005 

15 

•0000 

•0000 

•0000 

•0000 

♦0000 

•0000 

•0000 

•0000 

•0000 

0000 

16 1 

•5599 

•8147 

•9257 

•9719 

•9900 

•9967 

•9990 

•9997 

•9999 

1 0000 

2 

•1892 

•4853 

•7161 

•8593 

•9365 

•9739 

•9902 

•9967 

•9990 

•9997 

3 

•0429 

•2108 

•4386 

•6482 

•8029 

•9006 

•9549 

•9817 

•9934 

•9979 

4 

•0070 

•0684 

•2101 

•4019 

•5950 

•7541 

•8661 

•9349 

•9719 

•9894 

5 

•0009 

■0170 

•0791 

•2018 

•3698 

•5501 

•7108 

•8334 

•9147 

•9616 

6 

0001 

■0033 

♦0235 

•0817 

•1897 

•3402 

•5100 

•6712 

•8024 

•8949 

7 

0000 

•0005 

•0056 

•0267 

•0796 

•1753 

•3119 

•4728 

•6340 

•7228 

8 

0000 

•0001 

•0011 

0070 

•0271 

•0744 

•1594 

•2839 

•4371 

■5982 

9 

•0000 

•0000 

•0002 

•0015 

0075 

•0257 

•0671 

•1423 

•2559 

•4018 

10 

0000 

•0000 

•0000 

•0002 

•0016 

•0071 

•0229 

•0583 

•1241 

•2272 

11 

•0000 

•0000 

•0000 

•0000 

•0003 

•0016 

•0062 

•0191 

•0486 

•1051 

12 

•0000 

•0000 

•0000 

•0000 

•0000 

•0003 

■0013 

•0049 

•0149 

•0384 

13 

0000 

•0000 

•0000 

0000 

0000 

0000 

•0002 

•0009 

*0035 

•0106 

14 

0000 

•0000 

•0000 

0000 

•0000 

•0000 

•0000 

•0001 

•0006 

•0021 

15 

0000 

•0000 

•0000 

•0000 

0000 

0000 

•0000 

•0000 

*0001 

0003 

16 

0000 

•0000 

•0000 

•0000 

•0000 

0000 

♦0000 

•0000 

0000 

•0000 

17 1 

•5189 

•8332 

•9369 

•9775 

•9925 

•9977 

•9993 

•9998 

10000 

1 0000 

2 

•2078 

•5182 

•7475 

•8818 

•9499 

•9807 

•9933 

•9979 

•9994 

•9999 

3 

•0503 

•2382 

•4802 

•6904 

•8363 

•9226 

•9673 

•9877 

•9959 

•9988 

4 

•0088 

•0826 

*2444 

•4511 

•6470 

•7981 

•8972 

•9536 

•9816 

•9936 

5 

•0012 

•0221 

•0987 

•2418 

•4261 

•6113 

•7652 

•8740 

•9404 

•9755 

6 

0001 

0047 

•0319 

•1057 

•2347 

•4032 

•5803 

•7361 

•8529 

•9283 

7 

•0000 

•0008 

•0083 

*0377 

•1071 

•2248 

•3812 

•5522 

•7098 

•8338 

8 

0000 

•0001 

•0017 

•0109 

•0402 

•1046 

•2128 

•3595 

•5257 

•6855 

9 

•0000 

•0000 

•0003 

•0026 

•0124 

•0403 

•0994 

•1989 

•3374 

•5000 

10 

•0000 

•0000 

•0000 

•0005 

•0031 

•0127 

•0383 

•0919 

■1834 

•3145 
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P 

•05 -10 -15 -20 -25 -30 -35 40 -45 -50 


11 

•0000 

•0000 

•0000 

•0001 

•0006 

•0032 

•0120 

•0348 

•0826 

•1662 

12 

•0000 

•0000 

•0000 

•0000 

•0001 

•0007 

•0030 

•0106 

•0301 

•0717 

13 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0006 

•0025 

•0086 

•0245 

14 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0005 

•0019 

•0064 

15 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

0001 

•0003 

•0012 

16 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

17 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

18 1 

•6028 

•8499 

•9464 

•9820 

•9944 

•9984 

•9996 

•9999 

1 0000 

1 0000 

2 

•2265 

•5497 

•7759 

•9009 

•9605 

•9858 

•9954 

•9987 

•9997 

■9999 

3 

•0581 

•2662 

•5203 

•7287 

•8647 

■9400 

•9764 

•9918 

•9975 

•9993 

4 

•0109 

•0982 

•2798 

•4990 

•6943 

•8354 

•9217 

•9672 

•9880 

•9962 

5 

•0015 

•0282 

•1206 

•2836 

•4813 

•6673 

■8114 

•9058 

•9589 

•9846 

6 

•0002 

•0064 

•0419 

•1329 

■2825 

•4656 

•6450 

•7912 

•8923 

•9519 

7 

•0000 

•0012 

•0118 

•0513 

•1390 

•2783 

•4509 

•6257 

•7742 

•8811 

8 

•0000 

•0002 

•0027 

•0163 

•0569 

•1407 

•2717 

•4366 

•6085 

•7597 

9 

•0000 

•0000 

•0005 

•0043 

•0193 

•0596 

•1391 

•2632 

•4222 

•5927 

10 

•0000 

•0000 

•0001 

■0009 

•0054 

•0210 

•0597 

•1347 

•2527 

•4073 

11 

•0000 

•0000 

•0000 

•0002 

•0012 

•0061 

•0212 

•0576 

•1280 

•2403 

12 

•0000 

•0000 

•0000 

•0000 

•0002 

•0014 

•0062 

•0203 

•0537 

•1189 

13 

•0000 

•0000 

•0000 

•0000 

•0000 

•0003 

•0014 

•0058 

•0183 

•0481 

14 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0003 

•0013 

•0049 

•0154 

15 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0002 

•0010 

■0038 

16 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0007 

17 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

18 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

19 1 

•6226 

•8649 

•9544 

•9856 

•9958 

•9989 

•9997 

•9999 

10000 

10000 

2 

•2453 

•5797 

•8015 

•9171 

•9690 

•9896 

•9969 

•9992 

•9998 

1 0000 

3 

•0665 

•2946 

•5587 

•7631 

•8887 

•9538 

•9830 

•9945 

•9985 

•9996 

4 

•0132 

•1150 

•3159 

•5449 

•7639 

•8668 

•9409 

•9770 

•9923 

•9978 

5 

•0020 

•0352 

•1444 

•3267 

•5346 

•7178 

•8500 

•9304 

•9720 

•9904 

6 

•0002 

•0086 

•0537 

•1631 

•3322 

•5261 

•7032 

•8371 

•9223 

•9682 

7 

•0000 

•0017 

•0163 

•0676 

•1749 

•3345 

•5188 

•6919 

•8273 

•9165 

8 

•0000 

•0003 

•0041 

•0233 

•0775 

•1820 

♦3344 

•5122 

•6831 

•8204 

9 

•0000 

•0000 

•0008 

•0067 

•0287 

•0839 

•1855 

•3325 

•5060 

•6762 

10 

•0000 

•0000 

•0001 

•0016 

•0089 

•0326 

•0875 

•1861 

•3290 

•5000 

11 

■0000 

•0000 

•0000 

•0003 

•0023 

•0105 

•0347 

•0885 

•1841 

•3238 

12 

•0000 

•0000 

•0000 

•0000 

•0005 

•0028 

•0114 

•0352 

•0871 

•1796 

13 

•0000 

•0000 

•0000 

•0000 

•0001 

•0006 

•0031 

•0116 

•0342 

•0835 

14 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0007 

•0031 

•0109 

•0318 

15 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0006 

•0028 

•0096 

16 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0005 

•0022 

17 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0004 
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P 


•05 

•10 

•15 

•20 

•25 

•30 

•0000 

•0000 

•0000 

•0000 

0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 


18 

19 

20 1 
2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


•6415 

•2642 

•0755 

•0159 

•0026 


•0000 

0000 

0000 

0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 


•8784 

•6083 

•3231 

•1330 

•0432 


0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 


•9612 

•8244 

•5951 

•3523 

•1702 


0000 

•0000 

•0000 

•0000 

•0000 


•9885 

•9308 

•7939 

•5886 

•3704 


•0006 

0001 

0000 

•0000 

•0000 


•9968 

•9757 

•9087 

•7748 

•5852 


•0039 

•0009 

•0002 

0000 

0000 


•9992 

•9924 

•9645 

•8929 

•7625 


•0171 

0051 

•0013 

•0008 

•0000 


•0000 

•0000 

•0000 

0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

■0000 

•0000 

•0000 

•0000 

0000 

•0000 

•0000 

•0000 

•0000 


0003 

•0113 

•0673 

•1958 

•3828 

•5836 

•0000 

•0024 

•0219 

•0867 

•2142 

•3920 

•0000 

•0004 

•0059 

•0321 

•1018 

•2277 

•0000 

•0001 

•0013 

•0100 

•0409 

•1133 

•0000 

•0000 

•0002 

•0026 

•0139 

•0480 


•35 -40 *45 -50 


•0000 0000 
•0000 0000 

•9998 1 0000 
•9979 -9995 

•9879 -9964 

•9556 -9840 

•8818 *9490 

•7546 -8744 

•5834 -7500 

•3990 -5841 

•2376 -4044 

•1218 -2447 

•0532 -1275 

•0196 0565 

•0060 0210 
•0015 0065 

•0003 0016 

•0000 0003 

•0000 -0000 
•0000 0000 
•0000 -0000 
0000 0000 


•0000 0000 
•0000 0000 

10000 10000 
•9999 1 0000 
•9991 -9998 

•9951 -9987 

•9811 -9941 

•9447 -9793 

•8701 -9423 

•7480 -8684 

•5857 -7483 

•4086 -5881 

•2493 -4119 

•1308 *2517 

•0580 1316 

•0214 0577 

•0064 -0207 

•0015 0059 

•0003 0013 

•0000 0002 
•0000 -0000 
0000 0000 
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The Poisson density function is given by 


fix) = 

This table gives values of 


x \ 


-, m > 0 , x = 0 , 1 , 2 ,...» 


F= £ 

x = A"! 


for specified values of a :' and m . 


m 


x ' 

01 

0-2 

0-3 

0-4 

0-5 

0-6 

0-7 

0*8 

09 

1*0 

0 

10000 

1 0000 

1 0000 

10000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 

*0952 

•1813 

•2592 

•3297 

•3935 

•4512 

•5034 

•5507 

♦5934 

•6321 

2 

•0047 

•0175 

•0369 

•0616 

•0902 

•1219 

•1558 

*1912 

•2275 

•2642 

3 

•0002 

*0011 

•0036 

•0079 

•0144 

•0231 

•0341 

•0474 

•0629 

•0803 

4 

■0000 

*0001 

•0003 

•0008 

*0018 

•0034 

•0058 

•0091 

•0135 

*0190 

5 

•0000 

•0000 

•0000 

•0001 

•0002 

•0004 

•0008 

•0014 

•0023 

•0037 

6 

•0000 

•0000 

•0000 

•0000 

•0000 

*0000 

• 000 ! 

•0002 

•0003 

•0006 

7 

•0000 

•0000 

•0000 

*0000 

•0000 

•0000 

•0000 

•0000 

•0000 

*0001 


11 

1-2 

1-3 

1-4 

1*5 

1-6 

1-7 

1-8 

1*9 

2*0 

0 

1*0000 

1 0000 

10000 

10000 

1 0000 

10000 

1 0000 

1 0000 

10000 

1-0000 

1 

•6671 

•6988 

•7275 

•7534 

•7769 

•7981 

■8173 

•8347 

■8504 

•8647 

2 

•3010 

•3374 

•3732 

•4082 

•4422 

•4751 

•5068 

*5372 

•5663 

•5940 

3 

•0996 

•1205 

•1429 

•1665 

•1912 

•2166 

•2428 

•2694 

•2963 

•3233 

4 

•0257 

•0338 

•0431 

•0537 

•0656 

•0788 

•0932 

•1087 

•1253 

•1429 

5 

•0054 

•0077 

•0107 

•0143 

•0186 

•0237 

•0296 

*0364 

•0441 

•0527 

6 

•0010 

•0015 

•0022 

•0032 

•0045 

*0060 

•0080 

•0104 

*0132 

•0166 

7 

•0001 

•0003 

•0004 

•0006 

•0009 

•0013 

•0019 

•0026 

•0034 

•0045 

8 

•0000 

•0000 

•0001 

•0001 

•0002 

•0003 

•0004 

•0006 

•0008 

•0011 

9 

•0000 

•0000 

■0000 

•0000 

•0000 

•0000 

*0001 

•0001 

•0002 

•0002 


2*1 

2-2 

2-3 

2*4 

2-5 

2*6 

2-7 

2-8 

2*9 

3*0 

0 

1*0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 | 

1 

•8775 

•8892 

•8997 

•9093 

•9179 

•9257 

•9328 

•9392 

•9450 

•9502 t 

2 

•6204 

•6454 

■6691 

•6916 

•7127 

*7326 

•7513 

•7689 

•7854 

*8009 

3 

•3504 

•3773 

•4040 

•4303 

•4562 

•4816 

•5064 

*5305 

•5540 

*5768 

4 

•1614 

•1806 

•2007 

•2213 

•2424 

■2640 

•2859 

•3081 

•3304 

•3528 ; 

5 

• 062 ! 

•0725 

•0838 

•0959 

•1088 

•1226 

•1371 

•1523 

•1682 

•1847 

6 

•0204 

•0249 

•0300 

•0357 

•0420 

•0490 

•0567 

*0651 

•0742 

•0839 

7 

•0059 

•0075 

•0094 

•0116 

*0142 

•0172 

•0206 

•0244 

•0287 

•0335 

8 

•0015 

•0020 

•0026 

•0033 

•0042 

*0053 

•0066 

•0081 

•0099 

*0119 

9 

*0003 

•0005 

•0006 

*0009 

•0011 

•0015 

•0019 

•0024 

•0031 

•0038 

10 

•0001 

•0001 

♦0001 

•0002 

•0003 

•0004 

•0005 

•0007 

•0009 

•0011 

11 

•0000 

•0000 

•0000 

•0000 

*0001 

•0001 

•0001 

•0002 

•0002 

•0003 

12 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0001 
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Table D4 {continued) 


m 


x ' 

31 

3-2 

3-3 

3-4 

3-5 

3-6 

3-7 

3-8 

3-9 

40 

0 

10000 

1-0000 

1 0000 

1 0000 

1 0000 

1 0000 

10000 

1 0000 

10000 

10000 

1 

•9550 

•9592 

•9631 

•9666 

•9698 

■9727 

•9753 

•9776 

•9798 

•9817 

2 

•8153 

•8288 

•8414 

•8532 

•8641 

•8743 

•8838 

•8926 

•9008 

•9084 

3 

•5988 

•6201 

•6406 

•6603 

•6792 

•6973 

•7146 

•7311 

•7469 

•7619 

4 

•3752 

•3975 

•4197 

•4416 

•4634 

•4848 

•5058 

•5265 

•5468 

•5665 

5 

•2018 

•2194 

•2374 

•2558 

•2746 

•2936 

•3128 

•3322 

•3516 

•3712 

6 

•0943 

•1054 

•1171 

•1295 

•1424 

•1559 

•1699 

•1844 

•1994 

•2149 

7 

•0388 

•0446 

•0510 

•0579 

•0653 

•0732 

•0818 

•0919 

•1005 

•1107 

8 

•0142 

•0168 

•0198 

•0231 

•0267 

•0308 

•0352 

•0401 

•0454 

*0511 

9 

•0047 

•0057 

•0069 

•0083 

•0099 

•0117 

•0137 

•0160 

•0185 

•0214 

10 

0014 

•0018 

•0022 

•0027 

•0033 

•0040 

•0048 

•0058 

•0069 

•0081 

11 

•0004 

•0005 

•0006 

•0008 

•0010 

•0013 

•0016 

■0019 

•0023 

•0028 

12 

•0001 

•0001 

•0002 

•0002 

•0003 

•0004 

•0005 

•0006 

•0007 

•0009 

13 

•0000 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

•0002 

•0002 

•0003 

14 

•0000 

•0000 

•0000 

•0000 

0000 

•0000 

•0000 

•0000 

•0001 

•0001 


x ' 

4-1 

4-2 

4-3 

4*4 

4-5 

4-6 

4-7 

4-8 

4-9 

50 

0 

10000 

1*0000 

1*0000 

10000 

1*0000 

1 0000 

1*0000 

10000 

10000 

10000 

1 

•9834 

•9850 

•9864 

•9877 

•9889 

•9899 

•9909 

•9918 

•9926 

•9933 

2 

•9155 

•9220 

•9281 

•9337 

•9389 

•9437 

•9482 

•9523 

•9561 

•9596 

3 

•7762 

•7898 

•8026 

•8149 

•8264 

•8374 

•8477 

•8575 

•8667 

•8753 

4 

•5858 

•6046 

•6228 

•6406 

•6577 

•6743 

•6903 

■7058 

*7207 

•7350 

5 

•3907 

•4102 

•4296 

•4488 

•4679 

•4868 

•5054 

•5237 

•5418 

*5595 

6 

•2307 

•2469 

•2633 

•2801 

•2971 

•3412 

•3316 

•3490 

•3665 

•3840 

7 

•1214 

•1325 

•1442 

•1564 

•1689 

•1820 

•1954 

•2092 

•2233 

•2378 

8 

•0573 

•0639 

•0710 

•0786 

•0866 

•0951 

•1040 

•1133 

•1231 

•1334 

9 

•0245 

•0279 

•0317 

•0358 

•0403 

•0451 

•0503 

•0558 

•0618 

•0681 

10 

•0095 

•0111 

*0129 

•0149 

•0171 

•0195 

•0222 

•0251 

•0283 

•0318 

11 

0034 

•0041 

•0048 

•0057 

•0067 

•0078 

•0090 

•0104 

•0120 

•0137 

12 

•0011 

•0014 

•0017 

•0020 

•0024 

•0029 

•0034 

•0040 

•0047 

•0055 

13 

•0003 

•0004 

•0005 

•0007 

•0008 

•0010 

•0012 

•0014 

•0017 

•0020 

14 

•0001 

•0001 

•0002 

•0002 

*0003 

•0003 

•0004 

•0005 

•0006 

•0007 

15 

•0000 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

•0001 

•0002 

•0002 

16 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

*0001 

•0001 
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Table D4 ( continued ) 



5-1 

5-2 

5-3 

5-4 

5-5 

5-6 

5-7 

5-8 

5-9 

60 

0 

1 0000 

1 0000 

1 0000 

1 0000 

1000 Q 

1 0000 

1 0000 

1 0000 

1-0000 

1 0000 

1 

•9939 

•9945 

•9950 

•9955 

•9959 

•9963 

•9967 

•9970 

•9973 

•9975 

2 

•9628 

•9658 

•9686 

•9711 

•9734 

•9756 

•9776 

•9794 

•9811 

•9826 

3 

•8835 

•8912 

•8984 

•9052 

•9116 

•9176 

•9232 

•9285 

•9334 

•9380 

4 

•7487 

•7619 

•7746 

•7867 

•7983 

•8094 

•8200 

•8300 

•8396 

•8488 

5 

•5769 

•5939 

•6105 

•6267 

•6425 

•6579 

•6728 

•6873 

•7013 

•7149 

6 

•4016 

•4191 

•4365 

•4539 

•4711 

•4881 

•5050 

•5217 

•5381 

•5543 

7 

•2526 

•2676 

•2829 

•2983 

•3140 

•3297 

•3456 

•3616 

•3776 

•3937 

8 

•1440 

•1551 

•1665 

•1783 

•1905 

•2030 

•2159 

•2290 

•2424 

■2560 

9 

•0748 

•0819 

•0894 

•0974 

•1056 

•1143 

•1234 

•1328 

•1426 

•1528 

10 

•0356 

•0397 

•0441 

•0488 

•0538 

•0591 

•0648 

•0708 

•0722 

•0839 

11 

•0156 

•0177 

•0200 

•0225 

■0253 

•0282 

■0314 

•0349 

•0386 

•0426 

12 

•0063 

•0073 

•0084 

•0096 

•0110 

•0125 

•0141 

•0160 

•0179 

•0201 

13 

•0024 

•0028 

•0033 

•0038 

•0045 

•0051 

•0059 

•0068 

•0078 

•0088 

14 

•0008 

•0010 

•0012 

■0014 

■0017 

•0020 

•0023 

•0027 

•0031 

•0036 

15 

■0003 

•0003 

•0004 

•0005 

•0006 

•0007 

•0009 

•0010 

•0012 

•0014 

16 

•0001 

•0001 

•0001 

0002 

•0002 

•0002 

•0003 

•0004 

•0004 

•0005 

17 

•0000 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

•0001 

•0001 

•0002 

18 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

x ' 

6-1 

6-2 

6-3 

6*4 

6-5 

6-6 

6-7 

6-8 

6*9 

70 

0 

1 0000 

1 0000 

10000 

1 0000 

1 0000 

1 0000 

1 0000 

10000 

10000 

1-0000 

1 

•9978 

•9980 

•9982 

•9983 

•9985 

•9986 

•9988 

•9989 

•9990 

•9991 

2 

•9841 

•9854 

•9866 

•9877 

•9887 

•9897 

•9905 

•9913 

•9920 

•9927 

3 

•9423 

•9464 

•9502 

•9537 

•9570 

•9600 

•9629 

•9656 

•9680 

•9704 

4 

•8575 

•8658 

•8736 

•8811 

•8882 

•8948 

•9012 

•9072 

•9129 

•9182 

5 

•7281 

•7408 

•7531 

•7649 

•7763 

•7873 

•7978 

•8080 

•8177 

•8270 

6 

•5702 

•5859 

•6012 

•6163 

•6310 

•6453 

•6594 

•6730 

•6863 

•6993 

7 

•4098 

•4258 

•4418 

•4577 

•4735 

•4892 

•5047 

•5201 

•5353 

•5503 

8 

•2699 

•2840 

•2983 

•3127 

•3272 

•3419 

•3567 

•3715 

•3864 

•4013 

9 

•1633 

•1741 

•1852 

•1967 

•2084 

•2204 

•2327 

•2452 

•2580 

■2709 

10 

•0910 

•0984 

•1061 

•1142 

•1226 

•1314 

•1404 

•1498 

•1505 

•1695 

11 

•0469 

•0514 

•0563 

•0614 

•0668 

•0726 

•0786 

•0849 

•0916 

•0985 

12 

•0224 

■0250 

■0277 

•0307 

•0339 

•0373 

•0409 

•0448 

•0495 

•0534 

13 

•0100 

•0113 

•0127 

•0143 

•0160 

•0179 

•0199 

•0221 

•0245 

•0270 

14 

•0042 

•0048 

•0055 

•0063 

•0071 

•0080 

•0091 

•0102 

•0115 

•0128 

15 

•0016 

•0019 

•0022 

•0026 

•0030 

•0034 

•0039 

•0044 

•0050 

•0057 

16 

•0006 

•0007 

•0008 

•0010 

•0012 

•0014 

•0016 

•0018 

•0021 

•0024 

17 

•0002 

•0003 

•0003 

•0004 

•0004 

•0005 

■0006 

•0007 

•0008 

•0010 

18 

•0001 

•0001 

•0001 

•0001 

•0002 

•0002 

•0002 

•0003 

•0003 

•0004 

19 

•0000 

•0000 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

•0001 

•0001 
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APPENDIX D 


Table D 4 (< continued ) 


m 


x ' 

7-1 

7-2 

7-3 

7-4 

7-5 

7-6 

7-7 

7-8 

7-9 

80 

0 

1-0000 

1-0000 

10000 

1 0000 

1 0000 

1-0000 

1 0000 

1 0000 

1-0000 

1-0000 

1 

•9992 

•9993 

•9993 

•9994 

•9994 

•9995 

•9995 

•9996 

•9996 

•9997 

2 

•9933 

•9939 

•9944 

■9949 

•9953 

•9957 

•9961 

•9964 

•9967 

•9970 

3 

•9725 

•9745 

•9764 

•9781 

•9797 

•9812 

•9826 

•9839 

•9851 

•9862 

4 

•9233 

•9281 

•9326 

•9368 

•9409 

•9446 

•9482 

•9515 

•9547 

•9576 

5 

•8359 

•8445 

•8527 

•8605 

•8679 

•8751 

•8819 

•8883 

•8945 

•9004 

6 

•7119 

•7241 

•7360 

•7474 

•7586 

•7693 

•7797 

•7897 

•7994 

•8088 

7 

•5651 

•5796 

•5940 

•6080 

•6218 

•6354 

•6486 

•6616 

•6743 

•6866 

8 

•4162 

•4311 

•4459 

•4607 

•4754 

•4900 

•5044 

•5188 

•5330 

•5470 

9 

•2840 

•2973 

•3108 

•3243 

•3380 

•3518 

•3657 

•3796 

•3935 

•4075 

10 

•1798 

•1904 

•2012 

•2123 

•2236 

•2351 

•2469 

•2589 

•2710 

•2834 

11 

•1058 

•1133 

•1212 

•1293 

•1378 

•1465 

•1555 

•1648 

•1743 

•1841 

12 

•0580 

•0629 

•0681 

•0735 

•0792 

•0852 

•0915 

•0980 

•1048 

•1119 

13 

•0297 

•0327 

•0358 

■0391 

•0427 

•0464 

•0504 

•0546 

•0591 

•0638 

14 

•0143 

•0159 

•0176 

•0195 

•0216 

•0238 

•0261 

•0286 

•0313 

•0342 

15 

•0065 

•0073 

•0082 

•0092 

•0103 

•0114 

•0127 

•0141 

•0156 

•0173 

16 

•0028 

•0031 

•0036 

•0041 

•0046 

•0052 

•0059 

•0066 

•0074 

•0082 

17 

•0011 

•0013 

•0015 

•0017 

•0020 

•0022 

•0026 

•0029 

•0033 

•0037 

18 

•0004 

•0005 

•0006 

•0007 

•0008 

•0009 

•0011 

•0012 

•0014 

•0016 

19 

•0002 

•0002 

•0002 

•0003 

■0003 

•0004 

•0004 

•0005 

•0006 

•0005 

20 

•0001 

•0001 

•0001 

•0001 

•0001 

•0001 

•0002 

•0002 

•0002 

•0003 

21 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

x ' 

8-1 

8-2 

8-3 

8-4 

8-5 

8-6 

8-7 

8-8 

8*9 

90 

0 

1 0000 

1 0000 

1-0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

10000 

1 

•9997 

•9997 

•9998 

•9998 

•9998 

•9998 

•9998 

•9998 

•9999 

•9999 

2 

•9972 

•9975 

•9977 

•9979 

■9981 

•9982 

•9984 

•9985 

•9987 

•9988 

3 

•9873 

•9882 

•9891 

•9900 

•9907 

•9914 

•9921 

•9927 

•9932 

•9938 

4 

•9604 

•9630 

•9654 

•9677 

•9699 

•9719 

•9738 

•9756 

♦9772 

•9788 

5 

•9060 

•9113 

•9163 

•9211 

•9256 

•9299 

•9340 

•9379 

•9416 

•9450 

6 

•8178 

•8264 

•8347 

•8427 

•8504 

•8578 

•8648 

•8716 

•8781 

•8843 

7 

•6987 

•7104 

•7219 

•7330 

•7438 

•7543 

•7645 

•7744 

•7840 

•7932 

8 

•5609 

•5746 

•5881 

•6013 

•6144 

•6272 

•6398 

•6522 

•6643 

•6761 

9 

•4214 

•4353 

•4493 

•4631 

•4769 

•4906 

•5042 

•5177 

•5311 

•5443 

10 

•2959 

•3085 

•3212 

•3341 

•3470 

•3600 

•3731 

•3863 

•3994 

•4126 

11 

•1942 

•2045 

•2150 

•2257 

•2366 

•2478 

•2591 

•2706 

•2822 

•2940 

12 

•1193 

•1269 

•1348 

•1429 

•1513 

•1600 

•1689 

•1780 

•1874 

•1970 

13 

•0687 

•0739 

•0793 

•0850 

•0909 

•0971 

•1035 

•1102 

•1171 

•1242 

14 

•0372 

•0405 

•0439 

•0476 

•0514 

•0555 

•0597 

•0642 

•0689 

♦0739 

15 

•0190 

•0209 

•0229 

■0251 

•0274 

•0299 

•0325 

■0353 

•0383 

•0415 

16 

•0092 

•0102 

•0113 

•0125 

•0138 

•0152 

•0168 

•0184 

•0202 

♦0220 

17 

•0042 

•0047 

•0053 

•0059 

•0066 

•0074 

•0082 

•0091 

•0101 

•0111 

18 

•0018 

•0021 

•0023 

•0027 

•0030 

•0034 

•0038 

•0043 

•0048 

•0053 

19 

•0008 

•0009 

•0010 

•0011 

•0013 

•0015 

•0017 

•0019 

■0022 

•0024 

20 

0003 

•0003 

•0004 

•0005 

•0005 

•0006 

•0007 

•0008 

•0009 

•0011 

21 

•0001 

•0001 

•0002 

•0002 

•0002 

•0002 

•0003 

•0003 

•0004 

•0004 

22 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

•0001 

•0001 

•0002 

•0002 

23 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0000 

•0001 

•0001 
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m 


x ' 

91 

9*2 

9-3 

9-4 

9-5 

9-6 

9-7 

9-8 

9-9 

10 

0 

10000 

1 0000 

10000 

10000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 0000 

1 

•9999 

•9999 

•9999 

•9999 

•9999 

•9999 

•9999 

•9999 

1 0000 

1 0000 

2 

•9989 

•9990 

•9991 

•9991 

•9992 

•9993 

•9993 

•9994 

•9995 

•9995 

3 

•9942 

•9947 

•9951 

•9955 

•9958 

•9962 

•9965 

•9967 

•9970 

•9972 

4 

•9802 

•9816 

•9828 

•9840 

•9851 

•9862 

•9871 

•9880 

•9889 

•9897 

5 

•9483 

•9514 

•9544 

•9571 

•9597 

•9622 

•9645 

•9667 

•9688 

•9707 

6 

•8902 

•8959 

•9014 

•9065 

•9115 

•9162 

•9207 

•9250 

•9290 

•9329 

7 

•8022 

•8108 

•8192 

•8273 

•8351 

•8426 

•8498 

•8567 

•8634 

•8699 

8 

•6877 

•6990 

•7101 

•7208 

•7313 

•7416 

•7515 

•7612 

•7706 

•7798 

9 

•5574 

•5704 

•5832 

•5958 

•6082 

•6204 

•6324 

•6442 

•6558 

•6672 

10 

•4258 

•4389 

•4521 

•4651 

•4782 

•4911 

•5040 

•5168 

•5295 

•5421 

11 

•3059 

•3180 

•3301 

•3424 

•3547 

•3671 

•3795 

•3920 

•4045 

•4170 

12 

•2068 

•2168 

•2270 

•2374 

•2480 

•2588 

•2697 

•2807 

•2919 

•3032 

13 

•1316 

•1393 

•1471 

•1552 

•1636 

•1721 

•1809 

•1899 

•1991 

•2084 

14 

•0790 

•0844 

•0900 

•0958 

■1019 

•1081 

•1147 

•1214 

■1284 

•1355 

15 

•0448 

•0483 

•0520 

•0559 

•0600 

•0643 

■0688 

•0735 

•0784 

•0835 

16 

•0240 

•0262 

•0285 

•0309 

•0335 

•0362 

•0391 

•0421 

•0454 

•0487 

17 

•0122 

•0135 

•0148 

•0162 

•0177 

•0194 

•0211 

•0230 

•0249 

•0270 

18 

•0059 

•0066 

•0073 

•0081 

•0089 

•0098 

•0108 

•0119 

•0130 

•0143 

19 

•0027 

•0031 

•0034 

•0038 

•0043 

•0048 

•0053 

•0059 

•0065 

•0072 

20 

•0012 

•0014 

•0015 

•0017 

•0020 

•0022 

•0025 

•0028 

•0031 

•0035 

21 

•0005 

•0006 

•0007 

•0008 

•0009 

•0010 

•0011 

•0013 

•0014 

•0016 

22 

•0002 

•0002 

•0003 

•0003 

•0004 

•0004 

•0005 

•0005 

•0006 

•0007 

23 

•0001 

•0001 

•0001 

•0001 

•0001 

•0002 

•0002 

•0002 

•0003 

•0003 

24 

•0000 

•0000 

•0000 

•0000 

•0001 

•0001 

•0001 

•0001 

•0001 

•0001 
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Table D6 

This table gives values of t for specified values of v and F where 


F ( t ; v ) = 


1 r[(v +1)/2] 


J dx j 1 4- 


Note that 


Orv )± T ( v / 2 ) 


2 \ -(v+l)/2 


F 


V 

•60 

•75 

•90 

•95 

•975 

•99 

•995 

•9995 

1 

•325 

1000 

3078 

6-314 

12-706 

31-821 

63-657 

636-619 

2 

•289 

•816 

1*886 

2-920 

4-303 

6-695 

9-925 

31-598 

3 

•277 

•765 

1-638 

2-353 

3-182 

4-541 

5-841 

12-941 

4 

•271 

•741 

1-533 

2-132 

2-776 

3-747 

4-604 

8-610 

5 

•267 

•727 

1-476 

2-015 

2-571 

3-365 

4-032 

6-859 

6 

•265 

•718 

1-440 

1-943 

2-447 

3-143 

3-707 

5-959 

7 

•263 

•711 

1-415 

1-895 

2-365 

2-998 

3-499 

5-405 

8 

•262 

•706 

1-397 

1-860 

2-306 

2-896 

3-355 

5-041 

9 

•261 

•703 

1-383 

1-833 

2-262 

2-821 

3-250 

4-781 

10 

•260 

•700 

1-372 

1-812 

2-228 

2-764 

3-169 

4-587 

11 

•260 

•697 

1-363 

1-796 

2-201 

2-718 

3-106 

4-437 

12 

•259 

•695 

1-356 

1-782 

2-179 

2-681 

3-055 

4-318 

13 

•259 

•694 

1-350 

1-771 

2-160 

2-650 

3-012 

4-221 

14 

•258 

•692 

1-345 

1-761 

2-145 

2-624 

2-977 

4-140 

15 

•258 

•691 

1-341 

1-753 

2-131 

2*602 

2-947 

4-073 

16 

•258 

•690 

1-337 

1-746 

2-120 

2-583 

2-921 

4-015 

17 

•257 

•689 

1-333 

1-740 

2-110 

2-567 

2-898 

3-965 

18 

•257 

•688 

1-330 

1-734 

2-101 

2-552 

2-878 

3-922 

19 

•257 

•688 

1-328 

1-729 

2-093 

2-539 

2-861 

3-883 

20 

•257 

•687 

1-325 

1-725 

2-086 

2-528 

2-845 

3-850 

21 

•257 

•686 

1-323 

1-721 

2-080 

2-518 

2-831 

3-819 

22 

•256 

•686 

1-321 

1-717 

2-074 

2-508 

2-819 

3-792 

23 

•256 

•685 

1-319 

1-714 

2-069 

2-500 

2-807 

3-767 

24 

•256 

•685 

1-318 

1-711 

2-064 

2-492 

2-797 

3-745 

25 

•256 

•684 

1-316 

1-708 

2-060 

2-485 

2-787 

3-725 

26 

•256 

•684 

1-315 

1-706 

2-056 

2-479 

2-779 

3-707 

27 

•256 

•684 

1-314 

1-703 

2-052 

2-473 

2-771 

3-690 

28 

•256 

•683 

1-313 

1-701 

2-048 

2-467 

2-763 

3-674 

29 

•256 

•683 

1-311 

1-699 

2 045 

2-462 

2-756 

3-659 

30 

•256 

•683 

1-310 

1-697 

2-042 

2-457 

2-750 

3-646 

40 

•255 

•681 

1-303 

1-684 

2-021 

2-423 

2-704 

3-551 

60 

•254 

•679 

1-296 

1-671 

2000 

2-390 

2-660 

3-460 

120 

•254 

•677 

1-289 

1-658 

1-980 

2-358 

2-617 

3-373 

00 

•253 

•674 

1-282 

1-645 

1-960 

2-326 

2-576 

3-291 
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APPENDIX D 


Table D 7 

This table gives values of F such that 

r[(w + «)/2] 


F ( F ; m , n ) = 


r(/n/2) T(«/2) 1 « 


( m - 2)/2 


m \"'/ 2 p , a : 

I [1+ (m/«)*] (m+n)/2 ’ 


for specified values of m and w . It can also be used for values corresponding to 
F ( F ) = 0 - 10 , 0 - 05 , 0 - 025 , 0 - 01 , 0-005 and 0-001 by use of the relation 

F,_ a («,/n) = [F a (/tt,rt)] _1 . 


F ( F ; w , n ) = 0-90 


w 


n 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1 

39-86 

49-50 

53-59 

55-83 

57-24 

58-20 

58-81 

59-44 

59-86 

2 

8-53 

900 

9-16 

9-24 

9-29 

9-33 

9-35 

9-37 

9-38 

3 

5-54 

5-46 

5-39 

5-34 

5-31 

5-28 

5-27 

5-25 

5-24 

4 

4-54 

4-32 

4-19 

4-11 

4-05 

4-01 

3-98 

3-95 

3-94 

5 

4-06 

3-78 

3-62 

3-52 

3-45 

3-40 

3-37 

3-34 

3-32 

6 

3-78 

3-46 

3-29 

3-18 

3-11 

3-05 

3-01 

2-98 

2*96 

7 

3-59 

3-26 

3-07 

2-96 

2-88 

2-83 

2-78 

2-75 

2-72 

8 

3-46 

3-11 

2-92 

2-81 

2-73 

2-67 

2-62 

2-59 

2-56 

9 

3-36 

3-01 

2-81 

2-69 

2-61 

2-55 

2-51 

2-47 

2-44 

10 

3-29 

2-92 

2-73 

2-61 

2-52 

2-46 

2-41 

2-38 

2-35 

11 

3-23 

2-86 

2-66 

2-54 

2-45 

2-39 

2-34 

2-30 

2-27 

12 

3-18 

2-81 

2-61 

2-48 

2-39 

2-33 

2-28 

2-24 

2-21 

13 

3-14 

2-76 

2-56 

2-43 

2-35 

2-28 

2-23 

2-20 

2-16 

14 

3-10 

2-73 

2-52 

2-39 

2-31 

2-24 

2-19 

2-15 

2-12 

15 

3-07 

2-70 

2-49 

2-36 

2-27 

2-21 

2*16 

2-12 

2-09 

16 

3-05 

2-67 

2-46 

2-33 

2-24 

2-18 

2-13 

2-09 

2-06 

17 

3-03 

2-64 

2-44 

2-31 

2-22 

2-15 

2-10 

2-06 

2-03 

18 

3-01 

2-62 

2-42 

2-29 

2-20 

2-13 

2-08 

2-04 

2-00 

19 

299 

2-61 

2-40 

2-27 

2-18 

2-11 

2-06 

2-02 

1-98 

20 

2-97 

2-59 

2-38 

2-25 

2-16 

2-09 

2-04 

200 

1-96 

21 

2-96 

2-57 

2-36 

2-23 

2-14 

2-08 

2-02 

1-98 

1-95 

22 

2-95 

2-56 

2-35 

2-22 

2-13 

2-06 

2-01 

1-97 

1.93 

23 

2-94 

2-55 

2-34 

2-21 

2-11 

2-05 

1-99 

1*95 

1-92 

24 

2-93 

2-54 

2-33 

2-19 

2-10 

2-04 

1-98 

1-94 

1*91 

25 

2-92 

2-53 

2-32 

2-18 

2-09 

2-02 

1-97 

1-93 

1 89 

26 

2-91 

2-52 

2-31 

2-17 

2-08 

2-01 

1-96 

1-92 

1-88 

27 

2-90 

2-51 

2-30 

2-17 

2-07 

2-00 

1-95 

1-91 

1-87 

28 

2-89 

2-50 

2-29 

2-16 

2-06 

2-00 

1-94 

1-90 

1-87 

29 

2-89 

2-50 

2-28 

2-15 

2-06 

1-99 

1-93 

1-89 

1-86 

30 

2-88 

2-49 

2-28 

2-14 

2-05 

1-98 

1-93 

1-88 

1-85 

40 

2-84 

2-44 

2-23 

2-09 

2-00 

1-93 

1*87 

1-83 

1-79 

60 

2-79 

2-39 

2-18 

2-04 

1-95 

1-87 

1-82 

1-77 

1-74 

120 

2-75 

2-35 

2-13 

1-99 

1-90 

1-82 

1-77 

1-72 

1*68 

oo 

2-71 

2-30 

2-08 

1-94 

1-85 

1-77 

1-72 

1-67 

1-63 
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Table D 7 ( continued ) 


m 


n 

10 

12 

15 

20 

24 

30 

40 

60 

120 

oo 

1 

60-19 

60-71 

61*22 

61*74 

62-00 

62-26 

62-53 

62-79 

63-06 

63*33 

2 

9-39 

9-41 

9-42 

9-44 

9-45 

9-46 

9-47 

9-47 

9-48 

9-49 

3 

5-23 

5-22 

5-20 

5*18 

5-18 

5-17 

5-16 

5-15 

5-14 

5-13 

4 

3-92 

3-90 

3-87 

3-84 

3-83 

3-82 

3*80 

3*79 

3-78 

3*76 

5 

3-30 

3*27 

3-24 

3*21 

3*19 

3-17 

3-16 

3-14 

3*12 

3*10 

6 

2-94 

2*90 

2*87 

2-84 

2-82 

2-80 

2-78 

2-76 

2-74 

2*72 

7 

2*70 

2-67 

2-63 

2-59 

2-58 

2-56 

2-54 

2-51 

2-49 

2-47 

8 

2-54 

2-50 

2-46 

2-42 

2-40 

2-38 

2-36 

2-34 

2-32 

2-29 

9 

2-42 

2*38 

2-34 

2-30 

2-28 

2-25 

2-23 

2-21 

2-18 

2*16 

10 

2-32 

2-28 

2*24 

2-20 

2-18 

2-16 

2-13 

2-11 

2-08 

2-06 

11 

2-25 

2-21 

2-17 

2-12 

2-10 

2-08 

2*05 

2-03 

2-00 

1-97 

12 

219 

2-15 

2-10 

2-06 

2-04 

2-01 

1-99 

1*96 

1-93 

1-90 

13 

2-14 

2-10 

2-05 

2-01 

1-98 

1-96 

1-93 

1*90 

1-88 

1-85 

14 

2-10 

2-05 

2-01 

1-96 

1-94 

1-91 

1*89 

1-86 

1*83 

1*80 

15 

206 

2*02 

1-97 

1*92 

1-90 

1-87 

1-85 

1-82 

1-79 

1-76 

16 

203 

1-99 

1*94 

1-89 

1-87 

1*84 

1-81 

1*78 

1*75 

1*72 

17 

200 

1-96 

1*91 

1*86 

1-84 

1-81 

1*78 

1-75 

1-72 

1-69 

18 

1-98 

1*93 

1*89 

1-84 

1-81 

1-78 

1-75 

1-72 

1-69 

1*66 

19 

1*96 

1 * 91 

1-86 

1-81 

1 79 

1-76 

1*73 

1*70 

1-67 

1*63 

20 

1*94 

1*89 

1-84 

1*79 

1-77 

1-74 

1-71 

1*68 

1*64 

1*61 

21 

1-92 

1*87 

1-83 

1-78 

1-75 

1-72 

1*69 

1-66 

1*62 

1*59 

22 

1-90 

1*86 

1-81 

1-76 

1-73 

1*70 

1-67 

1-64 

1-60 

1-57 

23 

1-89 

1*84 

1-80 

1-74 

1-72 

1-69 

1-66 

1-62 

1-59 

1-55 

24 

1-88 

1*83 

1*78 

1*73 

1*70 

1-67 

1*64 

1*61 

1-57 

1.53 

25 

1-87 

1*82 

1*77 

1*72 

1*69 

1 66 

1*63 

1*59 

1-56 

1-52 

26 

1-86 

1*81 

1-76 

1-71 

1-68 

1-65 

1*61 

1*58 

1-54 

1-50 

27 

1*85 

1*80 

1-75 

1*70 

1-67 

1*64 

1-60 

1-57 

1*53 

1*49 

28 

1-84 

1-79 

1-74 

1-69 

1-66 

1-63 

1 59 

1-56 

1-52 

1-48 

29 

1-83 

1*78 

1-73 

1*68 

1*65 

1*62 

1*58 

1*55 

1*51 

1-47 

30 

1-82 

1*77 

1*72 

1-67 

1*64 

1*61 

1*57 

1*54 

1-50 

1-46 

40 

1*76 

1*71 

1-66 

1-61 

1-57 

1*54 

1-51 

1-47 

1-42 

1*38 

60 

1*71 

1-66 

1-60 

1*54 

1-51 

1-48 

1-44 

1-40 

1-35 

1*29 

120 

1*65 

1-60 

1-55 

1*48 

1-45 

1-41 

1*37 

1-32 

1-26 

1-19 

oo 

1*60 

1*55 

1*49 

1-42 

1-38 

1*34 

1*30 

1-24 

1-17 

100 
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Table D 7 ( continued ) 

F ( F ; m , n ) = 0-95 



"1234567 89 


1 161-4 199-5 215-7 224-6 230-2 234-0 236-8 238-9 240-5 

2 18-51 1900 19-16 19-25 19-30 19-33 19-35 19-37 19-38 

3 10-13 9-55 9-28 9-12 9-01 8-94 8-89 8-85 8-81 

4 7-71 6-94 6-59 6-39 6-26 6-16 6-09 6-04 6-00 

5 6-61 5-79 5-41 5-19 505 4-95 4-88 4-82 4-77 

6 5-99 5-14 4-76 4-53 4-39 4-28 4-21 4-15 4-10 

7 5-59 4-74 4-35 4-12 3-97 3-87 3-79 3-73 3-68 

8 5-32 4-46 4-07 3-84 3-69 3-58 3-50 3-44 3-39 

9 5-12 4-26 3-86 3-63 3-48 3-37 3-29 3-23 3-18 

10 4-96 4-10 3-71 3-48 3-33 3-22 3-14 3-07 3-02 

11 4-84 3-98 3-59 3-36 3-20 3-09 301 2-95 2-90 

12 4-75 3-89 3-49 3-26 3-11 300 2-91 2-85 2-80 

13 4-67 3-81 3-41 3-18 3-03 2-92 2-83 2-77 2-71 

14 4-60 3-74 3-34 3-11 2-96 2-85 2-76 2-70 2-65 

15 4-54 3-68 3-29 3 06 2-90 2-79 2-71 2-64 2-59 

16 4-49 3-63 3-21 3-01 2-85 2-74 2-66 2-59 2-54 

17 4-45 3-59 3-20 2-96 2-81 2-70 2-61 2-55 2-49 

18 4-41 3-55 3-16 2-93 2-77 2-66 2-58 2-51 2-46 

19 4-38 3-52 3-13 2-90 2-74 2-63 2-54 2-48 2-42 

20 4-35 3-49 3-10 2-87 2-71 2-60 2-51 2-45 2-39 

21 4-32 3-47 3 07 2-84 2-68 2-57 2-49 2-42 2-37 

22 4-30 3-44 3-05 2-82 2-66 2-55 2-46 2-40 2-34 

23 4-28 3-42 3-03 2-80 2-64 2-53 2-44 2-37 2-32 

24 4-26 3-40 3-01 2-78 2-62 2-51 2-42 2-36 2-30 

25 4-24 3-39 2-99 2-76 2-60 2-49 2-40 2-34 2-28 

26 4-23 3-37 2-98 2-74 2-59 2-47 2-39 2-32 2-27 

27 4-21 3-35 2-96 2-73 2-57 2-46 2-37 2-31 2-25 

28 4-20 3-34 2-95 2-71 2-56 2-45 2-36 2-29 2-24 

29 4-18 3-33 2-93 2-70 2-55 2-43 2-35 2-28 2-22 

30 4-17 3-32 2-92 2-69 2-53 2-42 2-23 2-27 2-21 

40 4-08 3-23 2-84 2-61 2-45 2-34 2-25 2-18 2-12 

60 4-00 3-15 2-76 2-53 2-37 2-25 2-17 2-10 2-04 

120 3-92 3-07 2-68 2-45 2-29 2-17 2-09 2-02 1-96 

3-84 3 00 2-60 2-37 2-21 2-10 2-01 1-94 1-88 


00 
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Table D7 ( continued) 


m 


ti 

10 

12 

15 

20 

24 

30 

40 

60 

120 

00 

1 

241-9 

243-9 

245-9 

248-0 

249-1 

250*1 

251*1 

252-2 

253*3 

254*3 

2 

19-40 

19-41 

19-43 

19*45 

19-45 

19*46 

19*47 

19*48 

19*49 

19*50 

3 

8-79 

8-74 

8-70 

8-66 

8-64 

8-62 

8*59 

8-57 

8-55 

8*53 

4 

5-96 

5*91 

5-86 

5-80 

5-77 

5-75 

5*72 

5*69 

5-66 

5-63 

5 

4-74 

4-68 

4-62 

4-56 

4*53 

4-50 

4-46 

4-43 

4-40 

4*36 

6 

4-06 

4-00 

3-94 

3*87 

3-84 

3 81 

3-77 

3*74 

3-70 

3*67 

7 

3-64 

3-57 

3-51 

3*44 

3-41 

3-38 

3-34 

3-30 

3*27 

3*23 

8 

3-35 

3-28 

3-22 

3-15 

3*12 

3-08 

304 

3*01 

2-97 

2*93 

9 

3-14 

3-07 

3-01 

2-94 

2-90 

2-86 

2-83 

2-70 

2*75 

2*71 

10 

2-98 

2-91 

2-85 

2-77 

2-74 

2-70 

2-66 

2*62 

2-58 

2-54 

11 

2-85 

2-79 

2-72 

2-65 

2-61 

2-57 

2-53 

2*49 

2*45 

2-40 

12 

2-75 

2-69 

2-62 

2-54 

2-51 

2-47 

2*43 

2*38 

2*34 

2*30 

13 

2-67 

2-60 

2-53 

2*46 

2-42 

2*38 

2*34 

2-30 

2-25 

2*21 

14 

2-60 

2-53 

2-46 

2-39 

2*35 

2-31 

2*27 

2-22 

2-18 

2*13 

15 

2-54 

2-48 

2-40 

2-33 

2*29 

2-25 

2-20 

2-16 

2*11 

2-07 

16 

2-49 

2-42 

2-35 

2-28 

2-24 

2-19 

2-15 

2*11 

2*06 

2*01 

17 

2-45 

2-38 

2-31 

2-23 

2-19 

2*15 

2*10 

2*06 

2*01 

1*96 

18 

2-41 

2-34 

2-27 

2-19 

2-15 

2*11 

2*06 

2-02 

1-97 

1*92 

19 

2-38 

2-31 

2-23 

2-16 

2*11 

2-07 

2*03 

1-98 

1*93 

1*88 

20 

2-35 

2-28 

2-20 

2-12 

2-08 

204 

1-99 

1-95 

1-90 

1-84 

21 

2-32 

2-25 

2-18 

2-10 

2-05 

2*01 

1*96 

1*92 

1-87 

1-81 

22 

2-30 

2-23 

2-15 

2*07 

2-03 

1*98 

1*94 

1*89 

1-84 

1*78 

23 

2-27 

2-20 

2-13 

2*05 

2-01 

1-96 

1*91 

1*86 

1*81 

1*76 

24 

2-25 

2-18 

2-11 

2-03 

1-98 

1*94 

1*89 

1*84 

1*79 

1-73 

25 

2-24 

2-16 

2-09 

2*01 

1*96 

1*92 

1-87 

1*82 

1-77 

1*71 

26 

2-22 

2-15 

2-07 

1-99 

1-95 

1*90 

1-85 

1*80 

1*75 

1*69 

27 

2-20 

2-13 

2-06 

1*97 

1*93 

1-88 

1-84 

1*79 

1*73 

1*67 

28 

219 

2-12 

2-04 

1*96 

1-91 

1*87 

1-82 

1*77 

1-71 

1*65 

29 

2*18 

2-10 

2-03 

1-94 

1-90 

1*85 

1*81 

1*75 

1-70 

1*64 

30 

2-16 

2-09 

2-01 

1*93 

1-89 

1-84 

1*79 

1-74 

1-68 

1-62 

40 

2-08 

200 

1-92 

1-84 

1-79 

1*74 

1 *69 

1*64 

1*58 

1*51 

60 

1-99 

1*92 

1*84 

1-75 

1-70 

1-65 

1-59 

1*53 

1*47 

1*39 

120 

1-91 

1-83 

1-75 

1*66 

1-61 

1-55 

1-50 

1*43 

1*35 

1*25 

00 

1-83 

1-75 

1-67 

1-57 

1-52 

1-46 

1-39 

1-32 

1-22 

1 00 
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F ( F ; m , n ) = 0*975 


Table D 7 ( continued ) 


m 


n 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1 

647-8 

799-5 

864-2 

899-6 

921-8 

937-1 

948*2 

956-7 

963-3 

2 

38-51 

3900 

39-17 

39*25 

39-30 

39*33 

39-36 

39-37 

39-39 

3 

17-44 

16 04 

15-44 

15-10 

14-88 

14*73 

14-62 

14*54 

14-47 

4 

12-22 

10-65 

9-98 

9-60 

9-36 

9-20 

9*07 

8-98 

8-90 

5 

10-01 

8-43 

7-76 

7*39 

7-15 

6-98 

6*85 

6-76 

6*68 

6 

8-81 

7-26 

6-60 

6-23 

5-99 

5-82 

5-70 

5*60 

5*52 

7 

807 

6-54 

5-89 

5-52 

5*29 

5*12 

4-99 

4-90 

4-82 

8 

7-57 

6-06 

5-42 

5*05 

4-82 

4-65 

4*53 

4-43 

4-36 

9 

7-21 

5-71 

5-08 

4*72 

4-48 

4-32 

4-20 

4-10 

4-03 

10 

6-94 

5-46 

4-83 

4-47 

4*24 

4*07 

3-95 

3*85 

3-78 

11 

6-72 

5-29 

4-63 

4-28 

4*04 

3*88 

3-76 

3-66 

3*59 

12 

6-55 

5-10 

4-47 

4-12 

3-89 

3*73 

3*61 

3*51 

3-44 

13 

6-41 

4*97 

4-35 

400 

3-77 

3-60 

3*48 

3*39 

3-31 

14 

6-30 

4-86 

4-24 

3-89 

3*66 

3*50 

3-38 

3-29 

3*21 

15 

6-20 

4*77 

4*15 

3*80 

3-58 

3-41 

3*29 

3*20 

3 12 

16 

6*12 

4-69 

4*08 

3*73 

3*50 

3-34 

3*22 

3-12 

3-05 

17 

6-04 

4-62 

4-01 

3-66 

3-44 

3-28 

3-16 

3*06 

2-98 

18 

5-98 

4-56 

3*95 

3-61 

3-38 

3-22 

3*10 

3*01 

2-93 

19 

5-92 

4-51 

3-90 

3*56 

3*33 

3-17 

3*05 

2-96 

2*88 

20 

5-87 

4-46 

3-86 

3*51 

3-29 

3*13 

3-01 

2*91 

2*84 

21 

5-83 

4*42 

3-82 

3-48 

3*25 

3-09 

2-97 

2-87 

2*80 

22 

5-79 

4-38 

3*78 

3-44 

3-22 

3*05 

2*93 

2-84 

2*76 

23 

5-75 

4-35 

3-75 

3*41 

3*18 

3*02 

2*90 

2*81 

2*73 

24 

5-72 

4-32 

3-72 

3-38 

3*15 

2*99 

2-87 

2-78 

2-70 

25 

5-69 

4*29 

2-69 

3*35 

3*13 

2*97 

2*85 

2*75 

2-68 

26 

5-66 

4-27 

3-67 

3*33 

3-10 

2*94 

2-82 

2*73 

2-65 

27 

5-63 

4-24 

3-65 

3-31 

3*08 

2-92 

2*80 

2-71 

2*63 

28 

5-61 

4-22 

3-63 

3-29 

3-06 

2-90 

2*78 

2-69 

2*61 

29 

5-59 

4-20 

3*61 

3*27 

3*04 

2-88 

2*76 

2-67 

2-59 

30 

5-57 

4-18 

3-59 

3*25 

3-03 

2-87 

2-75 

2-65 

2*57 

40 

5-42 

4-05 

3*46 

3-13 

2-90 

2-74 

2-62 

2*53 

2-45 

60 

5-29 

3-93 

3-34 

3 01 

2-79 

2-63 

2-51 

2-41 

2-33 

120 

515 

3-80 

3*23 

2*89 

2-67 

2-52 

2-39 

2-30 

2-22 

00 

5-02 

3-69 

3-12 

2-79 

2-57 

2*41 

2-29 

2*19 

2*11 
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Table D7 ( continued ) 



„ 10 12 15 20 24 30 40 60 120 


1 968-6 976-7 984-9 993-1 997-2 1001 1006 1010 1014 1018 

2 39-40 39-41 39-43 39-45 39-46 39-46 39-47 39-48 39-49 39-50 

3 14-42 14-34 14-25 14-17 14-12 14-08 14 04 13-99 13-95 13-90 

4 8-84 8-75 8-66 8-56 8-51 8-46 8-41 8-36 8-31 8-26 

5 6-62 6-52 6-43 6-33 6-28 6-23 6-18 6-12 6-07 6-02 

6 5-46 5-37 5-27 5-17 5-12 5-07 501 4-96 4-90 4-85 

7 4-76 4-67 4-57 4-47 4-42 4-36 4-31 4-25 4-20 4-14 

8 4-30 4-20 4-10 4-00 3-95 3-89 3-84 3-78 3-73 3-67 

9 3-96 3-87 3-77 3-67 3-61 3-56 3-51 3-45 3-39 3-33 

10 3-72 3-62 3-52 3-42 3-37 3-31 3-26 3-20 3-14 3-08 

11 3-53 3-43 3-33 3-23 3-17 3-12 3 06 3-00 2-94 2-88 

12 3-37 3-28 3 18 3 07 3-02 2-96 2-91 2-85 2-79 2-72 

13 3-25 3-15 3-05 2-95 2-89 2-84 2-78 2-72 2-66 2-60 

14 3-15 3-05 2-95 2-84 2-79 2-73 2-67 2-61 2-55 2-49 

15 3-06 2-96 2-86 2-76 2-70 2-64 2-59 2-52 2-46 2-40 

16 2-99 2-89 2-79 2-68 2-63 2-57 2-51 2-45 2-38 2-32 

17 2-92 2-82 2-72 2-62 2-56 2-50 2-44 2-38 2-32 2-25 

18 2-87 2-77 2-67 2-56 2-50 2-44 2-38 2-32 2-26 2-19 

19 3-82 2-72 2-62 2-51 2-45 2-39 2-33 2-27 2-20 2-13 

20 2-77 2-68 2-57 2-46 2-41 2-35 2-29 2-22 2-16 2-09 

21 2-73 2-64 2-53 2-42 2-37 2-31 2-25 2-18 2-11 2-04 

22 2-70 2-60 2-50 2-39 2-33 2-27 2-21 214 2-08 200 

23 2-67 2-57 2-47 2-36 2-30 2-24 2-18 2-11 2-04 1-97 

24 2-64 2-54 2-44 2-33 2-27 2-21 2-15 2-08 2-01 1-94 

25 2-61 2-51 2-41 2-30 2-24 2-18 2-12 2-05 1-98 1-91 

26 2-59 2-49 2-39 2-28 2-22 2-16 2-09 2-03 1-95 1-88 

27 2-57 2-47 2-36 2-25 219 2-13 2-07 2-00 1 -93 1-85 

28 2-55 2-45 2-34 2-23 2-17 2-11 2-05 1-98 1-91 1-83 

29 2-53 2-43 2-32 2-21 2 15 2-09 2-03 1-96 1-89 1-81 

30 2-51 2-41 2-31 2-20 2-14 2-07 2-01 1-94 1-87 1-79 

40 2-39 2-29 2-18 2-07 2 01 1-94 1-88 1-80 1-72 1-64 

60 2-27 2-17 2-06 1-94 1-88 1-82 1-74 1-67 1-58 1-48 

120 2-16 2-05 1-94 1-82 1-76 1-69 1-61 1-53 1-43 1-31 

oo 2-05 1-94 1-83 1-71 1-64 1-57 1-48 1-39 1-27 1-00 
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APPENDIX D 


F(F; m, n ) = 0-99 


Table D7 ( continued ) 


m 


n 

1 

2 

3 

4 

1 

4052 

4999-5 

5403 

5625 

2 

98-50 

99-00 

99-17 

99-25 

3 

34-12 

30*82 

29-46 

28-71 

4 

21-20 

18-00 

16-69 

15-98 

5 

16-26 

13-27 

12-06 

11-39 

6 

13-75 

10-92 

9-78 

9-15 

7 

12-25 

9-55 

8-45 

7-85 

8 

11-26 

8-65 

7-59 

7-01 

9 

10-56 

8-02 

6-99 

6-42 

10 

10-04 

7-56 

6-55 

5-90 

11 

9-65 

7-21 

6-22 

5-67 

12 

9-33 

6-93 

5-95 

5-41 

13 

9-07 

6-70 

5-74 

5-21 

14 

8-86 

6-51 

5-56 

5 04 

15 

8-68 

6-36 

5-42 

4-89 

16 

8-53 

6-23 

5-29 

4-77 

17 

8*40 

6-11 

5-18 

4-67 

18 

8-29 

6-01 

5-09 

4-58 

19 

8-18 

5-93 

5-01 

4-50 

20 

8-10 

5-85 

4-94 

4-43 

21 

8-02 

5-78 

4-87 

4-37 

22 

7-95 

5*72 

4-82 

4-31 

23 

7-88 

5-66 

4-76 

4-26 

24 

7-82 

5-61 

4-72 

4-22 

25 

7-77 

5-57 

4-68 

4-18 

26 

7-72 

5-53 

4-64 

4-14 

27 

7-68 

5-49 

4-60 

4-11 

28 

7-64 

5-45 

4-57 

4-07 

29 

7-60 

5-42 

4-54 

4-04 

30 

7-56 

5-39 

4-51 

4-02 

40 

7-31 

5-18 

4-31 

3-83 

60 

7-08 

4-98 

4-13 

3-65 

120 

6-85 

4-79 

3-95 

3-48 

00 

6-63 

4-61 

3-78 

3-32 


5 

6 

7 

8 

9 

5764 

5859 

5928 

5982 

6022 

99-30 

99-33 

99-36 

99-37 

99-39 

28-24 

27-91 

27-67 

27-49 

27-35 

15-52 

15-21 

14-98 

14-80 

14-66 

10-97 

10-67 

10-46 

10-29 

10-16 

8-75 

8-47 

8-26 

8-10 

7-98 

7-46 

7-19 

6-99 

6-84 

6-72 

6-63 

6-37 

6-18 

6-03 

5-91 

6-06 

5-80 

5-61 

5-47 

5-35 

5-64 

5-39 

5-20 

5-06 

4-94 

5-32 

5-07 

4-89 

4-74 

4-63 

5-06 

4-82 

4-64 

4-50 

4-39 

4-86 

4-62 

4-44 

4-30 

4-19 

4-69 

4-46 

4-28 

4-14 

4-03 

4-56 

4-32 

4-14 

400 

3-89 

4-44 

4-20 

4-03 

3-89 

3-78 

4-34 

4-10 

3-93 

3-79 

3-68 

4-25 

4-01 

3-84 

3-71 

3-60 

4-17 

3-94 

3-77 

3-63 

3-52 

4-10 

3-87 

3-70 

3-56 

3-46 

4-04 

3-81 

3-64 

3-51 

3-40 

3-99 

3-76 

3-59 

3-45 

3-35 

3-94 

3-71 

3-54 

3-41 

3-30 

3-90 

3-67 

3-50 

3-36 

3-26 

3-85 

3-63 

3-46 

3-32 

3-22 

3-82 

3-59 

3-42 

3-29 

3-18 

3-78 

3-56 

3-39 

3-26 

3-15 

3-75 

3-53 

3*36 

3-23 

3-12 

3-73 

3-50 

3-33 

3-20 

3-09 

3-70 

3-47 

3-30 

3-17 

3-07 

3-51 

3-29 

3-12 

2-99 

2-89 

3-34 

3-12 

2-95 

2*82 

2-72 

3-17 

2-96 

2-79 

2-66 

2-56 

3-02 

2-80 

2-64 

2*51 

2-41 
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Table D 7 ( continued ) 


m 


ti 

10 

12 

15 

20 

24 

30 

40 

60 

120 

00 

1 6056 6106 6157 6209 6235 6261 6287 6313 6339 6366 

2 

99*40 

99*42 

99*43 

99*45 

99*46 

99*47 

99*47 

99*48 

99*49 

99*50 

3 

27*23 

27*05 

26*87 

26*69 

26*60 

26*50 

26*41 

26*32 

26*22 

26*13 

4 

14*55 

14*37 

14*20 

14*02 

13*93 

13*84 

13*75 

13-65 

13*56 

13*46 

5 

10*05 

9*89 

9*72 

9*55 

9*47 

9*38 

9*29 

9*20 

9-11 

9*02 

6 

7*87 

7*72 

7*56 

7*40 

7*31 

7*23 

7*14 

7*06 

6*97 

6*88 

7 

6*62 

6*47 

6*31 

6*16 

6*07 

5*99 

5-91 

5*82 

5*74 

5*65 

8 

5*81 

5*67 

5*52 

5*36 

5*28 

5*20 

5*12 

5*03 

4*95 

4*86 

9 

5*26 

5-11 

4*96 

4 81 

4*73 

4*56 

4*57 

4*48 

4*40 

4*31 

10 

4*85 

4*71 

4*56 

4-41 

4*33 

4*25 

4*17 

4*08 

400 

3-91 

11 

4*54 

440 

4*25 

4*10 

4*02 

3*94 

3*86 

3*78 

3*69 

3*60 

12 

4*30 

4*16 

4*01 

3*86 

3*78 

3*70 

3*62 

3*54 

3*45 

3*36 

13 

4*10 

3*96 

3*82 

3*66 

3*59 

3*51 

3*43 

3*34 

3*25 

3*17 

14 

3*94 

3*80 

3*66 

3-51 

3*43 

3*35 

3*27 

3*18 

3*09 

300 

15 

3*80 

3*67 

3*52 

3*37 

3*29 

3*21 

3*13 

3*05 

2*96 

2*87 

16 

3*69 

3*55 

3*41 

3*26 

3*18 

3*10 

3*02 

2*93 

2*84 

2*75 

17 

3*59 

3*46 

3*31 

3*16 

3*08 

3 00 

2*92 

2*83 

2*75 

2*65 

18 

3*51 

3*37 

3*23 

3*08 

3 00 

2*92 

2*84 

2*75 

2*66 

2*57 

19 

3*43 

3*30 

3*15 

3*00 

2*92 

2*84 

2*76 

2*67 

2*58 

2*49 

20 

3*37 

3*23 

3*09 

2*94 

2*86 

2*78 

2*69 

2-61 

2*52 

2*42 

21 

3-31 

3*17 

3*03 

2*88 

2*80 

2*72 

2*64 

2*55 

2*46 

2*36 

22 

3*26 

3-12 

2*98 

2*83 

2*75 

2*67 

2*58 

2*50 

2*40 

2*31 

23 

3*21 

3*07 

2*93 

2*78 

2*70 

2*62 

2*54 

2*45 

2*35 

2*26 

24 

3*17 

3*03 

2*89 

2*74 

2*65 

2*58 

2*49 

2*40 

2*31 

2*21 

25 

3*13 

2*99 

2*85 

2*70 

2*62 

2*54 

2*45 

2*36 

2*27 

2*17 

26 

3 09 

2*96 

2*81 

2*66 

2*58 

2*50 

2*42 

2*33 

2*23 

2*13 

27 

3*06 

2*93 

2*78 

2*63 

2*55 

2*47 

2*38 

2*29 

2-20 

2*10 

28 

3*03 

2*90 

2*75 

2*60 

2*52 

2*44 

2*35 

2*26 

2-17 

2*06 

29 

300 

2*87 

2*73 

2*57 

2*49 

2*41 

2*33 

2*23 

2*14 

2*03 

30 

2*98 

2*84 

2*70 

2*55 

2*47 

2*39 

2*30 

2*21 

2 11 

2*01 

40 

2*80 

2*66 

2*52 

2*37 

2*29 

2*20 

2*11 

2*02 

1-92 

1*80 

60 

2*63 

2*50 

2*35 

2*20 

2*12 

2*03 

1*94 

1*84 

1*73 

1*60 

120 

2*47 

2*34 

2*19 

2*03 

1*95 

1*86 

1*76 

1-66 

1*53 

1*38 

00 

2*32 

2*18 

204 

1*88 

1*79 

1*70 

1*59 

1*47 

1*32 

1*00 



192 


APPENDIX D 


F ( F ; m , n ) = 0*995 


Table D 7 ( continued ) 


m 


n 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1 

16211 

20000 

21615 

22500 

23056 

23437 

23715 

23925 

24091 

2 

198*5 

199*0 

199*2 

199*2 

199*3 

199*3 

199*4 

199*4 

199*4 

3 

55*55 

49*80 

47*47 

46*19 

45*39 

44*84 

44*43 

44*13 

43*88 

4 

31*33 

26*28 

24*26 

23*15 

22*46 

21*97 

21*62 

21*35 

21*14 

5 

22*78 

18*31 

16*53 

15*56 

14*94 

14*51 

14*20 

13*96 

13*77 

6 

18*63 

14*54 

12*92 

12*03 

11*46 

11*07 

10*70 

10*57 

10*39 

7 

16*24 

12*40 

10*88 

10*05 

9*52 

9-16 

8*89 

8*68 

8*51 

8 

14*69 

11*04 

9*60 

8*81 

8*30 

7*95 

7*69 

7*50 

7*34 

9 

13*61 

10*11 

8*72 

7*96 

7*47 

7*13 

6*88 

6*69 

6*54 

10 

12*83 

9*43 

8*08 

7*34 

6*87 

6*54 

6*30 

6*12 

5*97 

11 

12*23 

8*91 

7*60 

6*88 

6*42 

6*10 

5*86 

5*68 

5*54 

12 

11*75 

8*51 

7*23 

6*52 

6*07 

5*76 

5*52 

5*35 

5*20 

13 

11*37 

8*19 

6*93 

6*23 

5*79 

5*48 

5*25 

5*08 

4*94 

14 

11*06 

7*92 

6*68 

6*00 

5*56 

5*26 

5*03 

4*86 

4-72 

15 

10*80 

7*70 

6*48 

5*80 

5*37 

5*07 

4*85 

4*67 

4*54 

16 

10*58 

7*51 

6*30 

5*64 

5*21 

4 91 

4*69 

4*52 

4*38 

17 

10*38 

7*35 

6*16 

5*50 

5*07 

4*78 

4*56 

4*39 

4*25 

18 

10*22 

7*21 

6*03 

5*37 

4*96 

4*66 

4*44 

4*28 

4*14 

19 

10*07 

7*09 

5*92 

5*27 

4*85 

4*56 

4*34 

4*18 

4*04 

20 

9*94 

6*99 

5*82 

5*17 

4*76 

4*47 

4*26 

4*09 

3*96 

21 

9*83 

6*89 

5*73 

5*09 

4*68 

4*39 

4*18 

4*01 

3*88 

22 

9*73 

6*81 

5*65 

5*02 

4*61 

4*32 

4 11 

3*94 

3*81 

23 

9*63 

6*73 

5*58 

4*95 

4*54 

4*26 

4*05 

3*88 

3*75 

24 

9*55 

6*66 

5*52 

4*89 

4*49 

4*20 

3*99 

3*83 

3*69 

25 

9*48 

6*60 

5*46 

4*84 

4*43 

4*15 

3*94 

3*78 

3*64 

26 

9*41 

6*54 

5*41 

4*79 

4*38 

4*10 

3*89 

3*73 

3*60 

27 

9*34 

6*49 

5*36 

4*74 

4*34 

4*06 

3*85 

3*69 

3*56 

28 

9*28 

6*44 

5*23 

4*70 

4*30 

4*02 

3*81 

3*65 

3*52 

29 

9*23 

6*40 

5*28 

4*66 

4*26 

3*98 

3*77 

3*61 

3*48 

30 

9*18 

6*35 

5*24 

4*62 

4*23 

3*95 

3*74 

3*58 

3*45 

40 

8*83 

6*07 

4*98 

4*37 

3*99 

3*71 

3*51 

3*25 

3*22 

60 

8*49 

5*79 

4*73 

4*14 

3*76 

3*49 

3*29 

313 

3*01 

120 

8*18 

5*54 

4*50 

3*92 

3*55 

3*28 

3*09 

2*93 

2*81 

co 

7*88 

5*30 

4*28 

3*72 

3*35 

3*09 

2*90 

2*74 

2*62 
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Table D 7 ( continued ) 


ft 


m 

JO 12 15 20 24 30 40 60 120 


1 

2 

3 

4 


24224 24426 24630 24836 24940 25044 25148 25253 

199-4 199-4 199-4 199-4 199-5 199-5 199-5 199-5 

43-69 43-39 43-08 42-78 42-62 42-47 42-31 42-15 

20-97 20-70 20-44 20 17 20-03 19-89 19-75 19-61 


25359 25465 

199-5 199-5 

41-99 41-83 

19-47 19-32 


5 

6 

7 

8 
9 


13-62 

13-38 

13-15 

12-90 

12-78 

10-25 

10-03 

9-81 

9-59 

9-47 

8-38 

8-18 

7-97 

7*75 

7*65 

7-21 

7-01 

6-81 

6-61 

6-50 

6-42 

6-23 

6-03 

5-83 

5-73 


12-66 

12-53 

12-40 

12-27 

12-14 

9-36 

9-24 

9-12 

9*00 

8-88 

7-53 

7-42 

7*31 

7-19 

7-08 

6-40 

6-29 

618 

6-06 

5-95 

5-62 

5-52 

5 41 

5-30 

5-19 


10 

5-85 

5-66 

5-47 

5-27 

11 

5-42 

5-24 

5-05 

4-86 

12 

5-09 

4-91 

4-72 

4-53 

13 

4-82 

4-64 

4-46 

4*27 

14 

4-60 

4-43 

4-25 

4-06 


5-17 5-07 4-97 4-86 4-75 4-64 

4-76 4-65 4-55 4-44 4-34 4-23 

4-43 4-33 4-23 4-12 4-01 3-90 

4-17 4-07 3-97 3-87 3-76 3-65 

3-96 3-86 3-76 3-66 3-55 3-44 


15 

4-42 

4-25 

4-07 

3-88 

16 

4-27 

4-10 

3-92 

3-73 

17 

4-14 

3-97 

3-79 

3-61 

18 

4-03 

3-86 

3-68 

3-50 

19 

3-93 

3-76 

3-59 

3-40 


3-79 3-69 3-58 3-48 3-37 3-26 

3-64 3-54 3-44 3-33 3-22 3-11 

3-51 3-41 3-31 3-21 3-10 2-98 

3-40 3-30 3-20 3 10 2-99 2-87 

3-31 3-21 3-11 3 00 2-89 2-78 


20 

3-85 

3-68 

3-50 

3-32 

21 

3-77 

3-60 

3-43 

3-24 

22 

3-70 

3-54 

3-36 

3-18 

23 

3-64 

3-47 

3-30 

3-12 

24 

3-59 

3-42 

3-25 

3-00 


3*22 

3*12 

3-02 

2-92 

2-81 

2-69 

3*15 

3-05 

2-95 

2-84 

2-73 

2*61 

3-08 

2-98 

2*88 

2-77 

2-66 

2-55 

3-02 

2-92 

2-82 

2-71 

2-60 

2-48 

2-97 

2-87 

2-77 

2-66 

2-55 

2-43 


25 

3-54 

3-37 

3-20 

3*01 

26 

3-49 

3-33 

3-15 

2-97 

27 

3-45 

3-28 

3-11 

2-93 

28 

3-41 

3-25 

3-07 

2-89 

29 

3-38 

3*21 

304 

2-86 


2-92 

2-82 

2-72 

2-61 

2-50 

2-38 

2-87 

2-77 

2-67 

2-56 

2-45 

2*33 

2-83 

2-73 

2-63 

2-52 

2-41 

2*25 

2-79 

2-69 

2-59 

2-48 

2-37 

2-29 

2-76 

2-66 

2-56 

2-45 

2-33 

2-24 


30 

3-34 

3-18 

3*01 

2-82 

2-73 

40 

3-12 

2-95 

2-78 

2-60 

2-50 

60 

2-90 

2-74 

2-57 

2-39 

2-29 

120 

2-71 

2-54 

2-37 

2-19 

2-09 

00 

2-52 

2-36 

2-19 

200 

1-90 


2-63 

2-52 

2-42 

2-30 

2-18 

2-40 

2-30 

2*18 

206 

1*93 

2-19 

2-08 

1*96 

1 83 

1-69 

1-98 

1-87 

1-75 

1-61 

1-43 

1-79 

1-67 

1-53 

1-36 

100 
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F ( F ; m , ri ) = 0*999 


Table D7 ( continued ) 







m 





n 

1 

2 

3 

4 

5 

6 

7 

8 

9 


1 

4053 * 

5000 * 

5404 * 

5625 * 

5764 * 

5859 * 

5929 * 

5981 * 

6023 * 

2 

998-5 

999*0 

999*2 

999-2 

999-3 

999-3 

999-4 

999*4 

999-4 

3 

167*0 

148-5 

141*1 

137*1 

134-6 

132-8 

131-6 

130*6 

129*9 

4 

74-14 

61*25 

56*18 

53*44 

51-71 

50-53 

49-66 

49*00 

48-47 

5 

47-18 

37*12 

33-20 

31-09 

29*75 

23*84 

28*16 

27*64 

27-24 

6 

35-51 

27-00 

23*70 

21*92 

20-81 

20*03 

19*46 

19*03 

18-69 

7 

29-25 

21-69 

18*77 

17*19 

16-21 

15*52 

15*02 

14*63 

14*33 

8 

25*42 

18*49 

15*83 

14-39 

13*49 

12-86 

12*40 

12 04 

11*77 

9 

22-86 

16*39 

13-90 

12*56 

11*71 

11-13 

10*70 

10*37 

10*11 

10 

21*04 

14*91 

12*55 

11*28 

10-48 

9*92 

9*52 

9*20 

8*96 

11 

19-69 

13*81 

11*56 

10*35 

9*58 

9-05 

8*66 

8*35 

8*12 

12 

18-64 

12*97 

10*80 

9*63 

8*89 

8-38 

800 

7*71 

7*48 

13 

17*81 

12*31 

10-21 

9*07 

8-35 

7-86 

7*49 

7*21 

6-98 

14 

17*14 

11-78 

9*73 

8*62 

7*92 

7-43 

7*08 

6*80 

6*58 

15 

16*59 

11*34 

9*34 

8-25 

7*57 

7*09 

6*74 

6-47 

6*26 

16 

16-12 

10*97 

900 

7-94 

7*27 

6*81 

6*46 

6*19 

5-98 

17 

15-72 

10*66 

8-73 

7-68 

7-02 

6-56 

6*22 

5*96 

5 * 75 

18 

15*38 

10-39 

8-49 

7-46 

6-81 

6*35 

6*02 

5*76 

5*56 

19 

15*08 

10-16 

8-28 

7*26 

6*62 

6*18 

5*85 

5*59 

5*39 

20 

14-82 

9*95 

8-10 

7*10 

6-46 

6*02 

5*69 

5*44 

5*24 

21 

14*59 

9*77 

7*94 

6*95 

6*32 

5-88 

5*56 

5*31 

5-11 

22 

14*38 

9 61 

7*80 

6*81 

6-19 

5*76 

5-44 

5*19 

4*99 

23 

14*19 

9*47 

7-67 

6-69 

6*08 

5*65 

5*33 

5*09 

4 89 

24 

14*03 

9*34 

7-55 

6*59 

5-98 

5*55 

5*23 

4*99 

4-80 

25 

13*88 

9-22 

7-45 

6-49 

5*88 

5-46 

5 15 

4*91 

4-71 

26 

13-74 

9*12 

7*36 

6*41 

5*80 

5*38 

5*07 

4*83 

4*64 

27 

13-61 

9*02 

7*27 

6-33 

5-73 

5*31 

5*00 

4-76 

4*57 

28 

13-50 

8-93 

7*19 

6*25 

5-66 

5-24 

4*93 

4*60 

4*50 

29 

13-39 

8-85 

7-12 

6-19 

5-59 

5*18 

4*87 

4*64 

4-45 

30 

13-29 

8-77 

7-05 

6*12 

5*53 

5*12 

4*82 

4*58 

4*39 

40 

12-61 

8-25 

6*60 

5*70 

5-13 

4*73 

4*44 

4*21 

4-02 

60 

11-97 

7*76 

6*17 

5-31 

4-76 

4-37 

4*09 

3-87 

3*69 

120 

11*38 

7-32 

5*79 

4*95 

4*42 

4-04 

3*77 

3*55 

3*38 

00 

10-83 

6-91 

5*42 

4*62 

4-10 

3*74 

3*47 

3*27 

3*10 


* Multiply these entries by 100 . 
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Table D 7 ( continued ) 


m 

n 10 12 15 20 24 30 40 60 120 


1 6056 * 6107 * 6158 * 6209 * 6235 * 6261 * 6287 * 6313 * 6340 * 6366 * 

2 999-4 999-4 999-4 999-4 999-5 999-5 999-5 999-5 999-5 999-5 

3 129-2 128-3 127-4 126-4 125-9 125-4 125-0 124-5 124-0 123-5 

4 48-05 47-41 46-76 46-10 45-77 45-43 45-09 44-75 44-40 4405 


5 

26-92 

26-42 

25-91 

25-39 

25-14 

6 

18-41 

17-99 

17-56 

17-12 

16-89 

7 

14-08 

13-71 

13-32 

12-93 

12-73 

8 

11-54 

11*19 

10-84 

10-48 

10-30 

9 

9-89 

9-57 

9-24 

8-90 

8-72 

10 

8-75 

8-45 

8*13 

7-80 

7-64 

11 

7-92 

7*63 

7-32 

7-01 

6-85 

12 

7-29 

700 

6-71 

6-40 

6-25 

13 

6-80 

6-52 

6*23 

5-93 

5-78 

14 

6-40 

6-13 

5-85 

5-56 

5-41 

15 

6-08 

5-81 

5-54 

5-25 

5-10 

16 

5-81 

5*55 

5-27 

4-99 

4-85 

17 

5-58 

5-32 

5-05 

4-78 

4-63 

18 

5-39 

5-13 

4-87 

4-59 

4-45 

19 

5-22 

4-97 

4-70 

4-43 

4-20 

20 

5-08 

4-82 

4-56 

4-29 

4 15 

21 

4-95 

4-70 

4-44 

4-17 

4*03 

22 

4-83 

4-58 

4-33 

4-06 

3-92 

23 

4-73 

4-48 

4-23 

3-96 

3-82 

24 

4-64 

4-39 

4-14 

3-87 

3-74 

25 

4-56 

4-31 

4-06 

3-79 

3-66 

26 

4-48 

4-24 

3-99 

3-72 

3-59 

27 

4-41 

4-17 

3-92 

3-66 

3-52 

28 

4-35 

4-11 

3-86 

3-60 

3-46 

29 

4-29 

405 

3-80 

3-54 

3-41 

30 

4-24 

400 

3-75 

3-49 

3-36 

40 

3-87 

3-64 

3-40 

3-15 

3 01 

60 

3-54 

3-31 

3-08 

2-83 

2-69 

120 

3-24 

3-02 

2-78 

2-53 

2-40 

oo 

2*96 

2-74 

2-51 

2-27 

2-13 


24-87 

24-60 

24-33 

24-06 

23-79 

16-67 

16-44 

16-21 

15-99 

15-75 

12-53 

12-33 

12-12 

11-91 

11-70 

10-11 

9-92 

9-73 

9-53 

9-33 

8-55 

8-37 

8-19 

800 

7-81 

7-47 

7-30 

7-12 

6-94 

6-76 

6-68 

6-52 

6*35 

6-17 

600 

6-09 

5-93 

5-76 

5-59 

5-42 

5-63 

5-47 

5-30 

5-14 

4-97 

5-25 

5-10 

4-94 

4-77 

4-60 

4-95 

4-80 

4-64 

4-47 

4-31 

4-70 

4-54 

4-39 

4-23 

4-06 

4-48 

4-33 

4-18 

4-02 

3-85 

4-30 

4-15 

400 

3-84 

3*67 

4-14 

3-99 

3-84 

3-68 

3-51 

4-00 

3-86 

3-70 

3-54 

3-38 

3-88 

3-74 

3-58 

3-42 

3-26 

3-78 

3-63 

3-48 

3-32 

3-15 

3-68 

3-53 

3-38 

3*22 

305 

3-59 

3-45 

3-29 

3-14 

2-97 

3-52 

3-37 

3-22 

3-06 

2-89 

3-44 

3-30 

3-15 

2-99 

2-82 

3-38 

3-23 

3-08 

2-92 

2-75 

3-32 

3-18 

3-02 

2-86 

2-69 

3-27 

3-12 

2-97 

2-81 

2-64 

3-22 

3-07 

2-92 

2-76 

2-59 

2-87 

2-73 

2-57 

2-41 

2-23 

2-55 

2-41 

2-25 

2-08 

1-89 

2-26 

2-11 

1-95 

1-76 

1 54 

1-99 

1-84 

1-66 

1-45 

1-00 


Multiply these entries by 100 . 



Table D 8 

This table gives the sample size needed, for values of P[I] = a and P[II] =1 - /?, for a test on a single mean with unknown 
standard deviation. For example, if the null hypothesis is H 0 :n = fi 0 , and the alternative is H a \ji < n 0 the test statistic 
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Table D9 

This table gives the sample size needed, for given values of P[I] = a and P[II] =1 — ji, for a test of the hypothesis of the equality 
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»n O O >o 
O -1 ^ (N M 
6 6 6 6 6 


030 123 87 61 0-30 

035 110 90 64 102 45 0-35 

040 85 70 100 50 108 78 35 0-40 

045 118 68 101 55 105 79 39 108 86 62 28 0-45 

050 96 55 106 82 45 106 86 64 32 88 70 51 23 0-50 
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APPENDIX D 


Table DIO 

This table gives the value of the ratio R of the population variance a 2 to a 
standard variance <r 0 2 , which is undetected with probability = 1 - /? in a y 2 test 
at significance level a of an estimate s 2 of c 2 based on n degrees of freedom. For 
R < 1 enter the table with fi' = a, a' = fi and R' = 1/R. 


n 

P = 001 

a = 001 

P = 0 05 p = 01 

P = 0-5 

p = 001 

a = 005 

P = 0 05 P = 01 

o 

II 

1 

42,240 

1,687 

420*2 

14*58 

25,450 

977*0 

243*3 

8*444 

2 

458*2 

89*78 

43*71 

6*644 

298*1 

58*40 

28*43 

4*322 

3 

98*79 

32*24 

19*41 

4*795 

68*05 

22*21 

13*37 

3*303 

4 

44*69 

18*68 

12*48 

3*955 

31*93 

13*35 

8*920 

2*826 

5 

27*22 

13*17 

9*369 

3*467 

19*97 

9*665 

6*875 

2*544 

6 

19*28 

10*28 

7*628 

3*144 

14*44 

7*699 

5*713 

2*354 

7 

14*91 

8*524 

6*521 

2*911 

11*35 

6*491 

4*965 

2*217 

8 

12*20 

7*352 

5*757 

2*736 

9*418 

5*675 

4*444 

2*112 

9 

10*38 

6*516 

5*198 

2*597 

8*103 

5*088 

4*059 

2*028 

10 

9*072 

5*890 

4*770 

2*484 

7*156 

4-646 

3*763 

1*960 

12 

7*343 

5*017 

4*159 

2*312 

5*889 

4*023 

3*335 

1*854 

15 

5*847 

4*211 

3*578 

2*132 

4*780 

3*442 

2*925 

1*743 

20 

4*548 

3*462 

3*019 

1*943 

3*802 

2*895 

2*524 

1*624 

24 

3*959 

3*104 

2*745 

1*842 

3*354 

2*630 

2*326 

1*560 

30 

3-403 

2*752 

2*471 

1*735 

2*927 

2*367 

2*125 

1*492 

40 

2*874 

2*403 

2*192 

1*619 

2*516 

2*103 

1*919 

1*418 

60 

2*358 

2*046 

1*902 

1*490 

2*110 

1*831 

1*702 

1*333 

120 

1*829 

1-661 

1*580 

1*332 

1*686 

1*532 

1*457 

1*228 

00 

1*000 

1 000 

1 000 

1*000 

1 000 

1*000 

1*000 

1*000 



Table Dll 

This table shows the ratio R of two population variances, i.e. R = g 2 Ig 2 * which remains undetected with probability fi = 1 — P 
in a variance ratio test at significance level a of the ratio s 2 2 /s l 2 of estimates of the two variances, each being based on n degrees of 
freedom. 
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